===== Usage ===== To use PDBNucleicAcids in a project: .. code-block:: python import PDBNucleicAcids Build All Strands of Nucleic Acids ---------------------------------- PDBNucleicAcids can parse all strands of nucleic acids in a Biopython structure. .. code-block:: python from Bio.PDB.PDBList import PDBList from Bio.PDB.MMCIFParser import MMCIFParser from PDBNucleicAcids.NucleicAcid import NABuilder # retrive file from PDB using Biopython pdbl = PDBList() pdbl.retrieve_pdb_file(pdb_code="1A02", pdir=".") pdbl.retrieve_assembly_file(pdb_code="1A02", assembly_num=1, pdir=".") # ... or else use your own # parse and build structure with Biopython parser = MMCIFParser() structure = parser.get_structure( structure_id="1A02", filename="1a02-assembly1.cif" ) # build all nucleic acids builder = NABuilder() na_list = builder.build_nucleic_acids(entity=structure) na_list .. code-block:: console [, ] Every nucleic acid is like a Python list: .. code-block:: python na = na_list[0] na[:5] .. code-block:: console [, , , , ] PDBNucleicAcids can get a nucleic acid sequence: .. code-block:: python na.get_seq() .. code-block:: console Seq('TTGGAAAATTTGTTTCATAG') PDBNucleicAcids can also get a nucleic acid chain id, nucleic acid type and all atoms: .. code-block:: python print(na.get_chain_id(), na.get_na_type()) print(na.get_atoms()[:5]) .. code-block:: console A DNA [, , , , ] Build All Double-Stranded Nucleic Acids --------------------------------------- PDBNucleicAcids can parse all double-stranded nucleic acids in a Biopython structure. .. code-block:: python from PDBNucleicAcids.NucleicAcid import DSNABuilder builder = DSNABuilder() dsna_list = builder.build_double_strands(structure) dsna_list .. code-block:: console [] Get All Base-Pairs ------------------ PDBNucleicAcids can extract all base-pairs object in a double-stranded nucleic acid. Double straded nucleic acids are like a list of base-pairs: .. code-block:: python dsna = dsna_list[0] dsna[:5] .. code-block:: console [, , , , ] PDBNucleicAcids can extract all base-pairs data in a double-stranded nucleic acid. .. code-block:: python dsna = dsna_list[0] df = dsna.as_dataframe() df.head() .. code-block:: console i_chain_id i_residue_index ... j_residue_index j_chain_id 0 A 4003 ... 5020 B 1 A 4004 ... 5019 B 2 A 4005 ... 5018 B 3 A 4006 ... 5017 B 4 A 4007 ... 5016 B Search Individual Pair Bases ---------------------------- PDBNucleicAcids can search for paired nucleotide, given an input nucleotide. .. code-block:: python from PDBNucleicAcids.NucleicAcid import search_paired_base # input nucleotide base = structure[0]["A"][4003] # DG # search for paired nucleotide paired_base = search_paired_base(base) paired_base .. code-block:: console PDBNucleicAcids will recognize unpaired bases. .. code-block:: python # input nucleotide base = structure[0]["A"][4001] # DT # search for paired nucleotide paired_base = search_paired_base(base) print(paired_base) .. code-block:: console None DNA-RNA Complexes ----------------- PDBNucleicAcids base-pairing can be used for DNA-RNA base-pairs. .. code-block:: python from Bio.PDB.PDBList import PDBList from Bio.PDB.MMCIFParser import MMCIFParser from PDBNucleicAcids.NucleicAcid import search_paired_base # retrive file from PDB using Biopython pdbl = PDBList() pdbl.retrieve_assembly_file(pdb_code="9K7R", assembly_num=1, pdir=".") # parse and build structure with Biopython parser = MMCIFParser() structure = parser.get_structure( structure_id="9K7R", filename="9k7r-assembly1.cif" ) # input nucleotide base = structure[0]["B"][8] # DT # search for paired nucleotide paired_base = search_paired_base(base) # paired base is RNA base paired_base .. code-block:: console Custom Rules for Base-Pairing ----------------------------- PDBNucleicAcids base-pairing can be expanded, by changing parameters used in the base-pairing rules. .. code-block:: python from PDBNucleicAcids.BasePairRules import dsDNAWatsonCrickBasePairRules parser = MMCIFParser() structure = parser.get_structure( structure_id="1A02", filename="1a02-assembly1.cif" ) # custom base pairing rules my_rules = dsDNAWatsonCrickBasePairRules( max_distance = 3.5, max_angle = 60, max_stagger = 2.0, ) # input nucleotide base = structure[0]["A"][4003] # DG # search for paired nucleotide paired_base = search_paired_base(base, pairing_rules=my_rules) PDBNucleicAcids base-pairing can be expanded even further by creating your own base-pairing rules. .. code-block:: python from PDBNucleicAcids.BasePairRules import WatsonCrickBasePairRules parser = MMCIFParser() structure = parser.get_structure( structure_id="1A02", filename="1a02-assembly1.cif" ) # input nucleotide base = structure[0]["A"][1] # G # search for paired nucleotide with default rules pairing_rules = WatsonCrickBasePairRules() paired_base = search_paired_base(base, pairing_rules=pairing_rules) # this returns None because it binds a non-standard DNA base: 5CM # to circumvent this we can code our own rules class MyRules(WatsonCrickBasePairRules): def __init__(self): super().__init__() self.complementary_pairs += [("5CM", "G"), ("G", "5CM")] self.pyrimidines.append("5CM") self.accepted_nucleotides.append("5CM") # search for paired nucleotide with custom base pairing rules pairing_rules = MyRules() paired_base = search_paired_base(base, pairing_rules=pairing_rules) paired_base .. code-block:: console Better extensions may be implemented in a later version: .. code-block:: python class MyRules(WatsonCrickBasePairRules): """Class for extending base-pairing rules based on our data.""" def __init__(self): super().__init__() self.complementary_pairs += [ ("DU", "DA"), ("DA", "DU"), ("DG", "DOC"), ("DOC", "DG"), ("DG", "CBR"), ("CBR", "DG"), ("DG", "C7S"), ("C7S", "DG"), ("DG", "C7R"), ("C7R", "DG"), ("DG", "C38"), ("C38", "DG"), ("DG", "C2S"), ("C2S", "DG"), ("DA", "BRU"), ("BRU", "DA"), ("DT", "AS"), ("AS", "DT"), ("DA", "PST"), ("PST", "DA"), ("DA", "5IU"), ("5IU", "DA"), ("DG", "5HC"), ("5HC", "DG"), ("DG", "5FC"), ("5FC", "DG"), ("DG", "5CM"), ("5CM", "DG"), ("DT", "2PR"), ("2PR", "DT"), ("DG", "1CC"), ("1CC", "DG"), ("DT", "1AP"), ("1AP", "DT"), ] pyrimidines = [ "DOC", "CBR", "C7S", "C7R", "C38", "C2S", "5HC", "5FC", "5CM", "1CC", "BRU", "PST", "5IU", ] purines = ["AS", "2PR", "1AP"] self.pyrimidines.extend(pyrimidines) self.purines.extend(purines) self.accepted_nucleotides.extend(pyrimidines + purines) Limitations ----------- PDBNucleicAcids doesn't support yet recognition of flipped bases, gaps and nicks.