Usage

To use PDBNucleicAcids in a project:

import PDBNucleicAcids

Build All Strands of Nucleic Acids

PDBNucleicAcids can parse all strands of nucleic acids in a Biopython structure.

from Bio.PDB.PDBList import PDBList
from Bio.PDB.MMCIFParser import MMCIFParser
from PDBNucleicAcids.NucleicAcid import NABuilder

# retrive file from PDB using Biopython
pdbl = PDBList()
pdbl.retrieve_pdb_file(pdb_code="1A02", pdir=".")
pdbl.retrieve_assembly_file(pdb_code="1A02", assembly_num=1, pdir=".")
# ... or else use your own

# parse and build structure with Biopython
parser = MMCIFParser()
structure = parser.get_structure(
     structure_id="1A02", filename="1a02-assembly1.cif"
)

# build all nucleic acids
builder = NABuilder()
na_list = builder.build_nucleic_acids(structure)
na_list
[<NucleicAcid chain='A' type='DNA' start=4001 end=4020>,
 <NucleicAcid chain='B' type='DNA' start=5001 end=5020>]

Every nucleic acid is like a Python list:

na = na_list[0]
na[:5]
[<Residue DT het=  resseq=4001 icode= >,
 <Residue DT het=  resseq=4002 icode= >,
 <Residue DG het=  resseq=4003 icode= >,
 <Residue DG het=  resseq=4004 icode= >,
 <Residue DA het=  resseq=4005 icode= >]

PDBNucleicAcids can get a nucleic acid sequence:

na.get_seq()
Seq('TTGGAAAATTTGTTTCATAG')

PDBNucleicAcids can also get a nucleic acid chain id, nucleic acid type and all atoms:

print(na.get_chain_id(), na.get_na_type())
print(na.get_atoms()[:5])
A DNA
[<Atom O5'>, <Atom C5'>, <Atom C4'>, <Atom O4'>, <Atom C3'>]

Build All Double-Stranded Nucleic Acids

PDBNucleicAcids can parse all double-stranded nucleic acids in a Biopython structure.

from PDBNucleicAcids.NucleicAcid import DSNABuilder

builder = DSNABuilder()
dsna_list = builder.build_double_strands(structure)
dsna_list
[<DoubleStrandNucleicAcid type='dsDNA' i-th strand='A'
 j-th strand='B' length=17>]

Get All Base-Pairs

PDBNucleicAcids can extract all base-pairs object in a double-stranded nucleic acid. Double straded nucleic acids are like a list of base-pairs:

dsna = dsna_list[0]
dsna[:5]
[<BasePair i_res=DG j_res=DC>,
 <BasePair i_res=DG j_res=DC>,
 <BasePair i_res=DA j_res=DT>,
 <BasePair i_res=DA j_res=DT>,
 <BasePair i_res=DA j_res=DT>]

PDBNucleicAcids can extract all base-pairs data in a double-stranded nucleic acid.

dsna = dsna_list[0]
df = dsna.get_dataframe()
df.head()
  i_chain_id  i_residue_index  ... j_residue_index j_chain_id
0          A             4003  ...            5020          B
1          A             4004  ...            5019          B
2          A             4005  ...            5018          B
3          A             4006  ...            5017          B
4          A             4007  ...            5016          B

Search Individual Pair Bases

PDBNucleicAcids can search for paired nucleotide, given an input nucleotide.

from PDBNucleicAcids.NucleicAcid import search_paired_base

# input nucleotide
base = structure[0]["A"][4003]  # DG

# search for paired nucleotide
paired_base = search_paired_base(base)
paired_base
<Residue DC het=  resseq=5020 icode= >

PDBNucleicAcids will recognize unpaired bases.

# input nucleotide
base = structure[0]["A"][4001]  # DT

# search for paired nucleotide
paired_base = search_paired_base(base)
print(paired_base)
None

DNA-RNA Complexes

PDBNucleicAcids base-pairing can be used for DNA-RNA base-pairs.

from Bio.PDB.PDBList import PDBList
from Bio.PDB.MMCIFParser import MMCIFParser
from PDBNucleicAcids.NucleicAcid import search_paired_base

# retrive file from PDB using Biopython
pdbl = PDBList()
pdbl.retrieve_assembly_file(pdb_code="9K7R", assembly_num=1, pdir=".")

# parse and build structure with Biopython
parser = MMCIFParser()
structure = parser.get_structure(
     structure_id="9K7R", filename="9k7r-assembly1.cif"
)

# input nucleotide
base = structure[0]["B"][8]  # DT

# search for paired nucleotide
paired_base = search_paired_base(base)

# paired base is RNA base
paired_base
<Residue A het=  resseq=2 icode= >

Custom Rules for Base-Pairing

PDBNucleicAcids base-pairing can be expanded, by changing parameters used in the base-pairing rules.

from PDBNucleicAcids.BasePairRules import dsDNAWatsonCrickBasePairRules

parser = MMCIFParser()
structure = parser.get_structure(
     structure_id="1A02", filename="1a02-assembly1.cif"
)

# custom base pairing rules
my_rules = dsDNAWatsonCrickBasePairRules(
    max_distance = 3.5,
    max_angle = 60,
    max_stagger = 2.0,
)

# input nucleotide
base = structure[0]["A"][4003]  # DG

# search for paired nucleotide
paired_base = search_paired_base(base, pairing_rules=my_rules)

PDBNucleicAcids base-pairing can be expanded even further by creating your own base-pairing rules.

from PDBNucleicAcids.BasePairRules import WatsonCrickBasePairRules

parser = MMCIFParser()
structure = parser.get_structure(
     structure_id="1A02", filename="1a02-assembly1.cif"
)

# input nucleotide
base = structure[0]["A"][1]  # G

# search for paired nucleotide with default rules
pairing_rules = WatsonCrickBasePairRules()
paired_base = search_paired_base(base, pairing_rules=pairing_rules)
# this returns None because it binds a non-standard DNA base: 5CM

# to circumvent this we can code our own rules
class MyRules(WatsonCrickBasePairRules):
    def __init__(self):
        super().__init__()

        self.complementary_pairs += [("5CM", "G"), ("G", "5CM")]

        self.pyrimidines.append("5CM")

        self.accepted_nucleotides.append("5CM")

# search for paired nucleotide with custom base pairing rules
pairing_rules = MyRules()
paired_base = search_paired_base(base, pairing_rules=pairing_rules)
paired_base
<Residue 5CM het=H_5CM resseq=9 icode= >

Limitations

PDBNucleicAcids doesn’t support yet recognition of flipped bases, gaps and nicks.