=====
Usage
=====

To use PDBNucleicAcids in a project:

.. code-block:: python

    import PDBNucleicAcids


Build All Strands of Nucleic Acids
----------------------------------

PDBNucleicAcids can parse all strands of nucleic acids in a Biopython structure.

.. code-block:: python

    from Bio.PDB.PDBList import PDBList
    from Bio.PDB.MMCIFParser import MMCIFParser
    from PDBNucleicAcids.NucleicAcid import NABuilder

    # retrive file from PDB using Biopython
    pdbl = PDBList()
    pdbl.retrieve_pdb_file(pdb_code="1A02", pdir=".")
    pdbl.retrieve_assembly_file(pdb_code="1A02", assembly_num=1, pdir=".")
    # ... or else use your own

    # parse and build structure with Biopython
    parser = MMCIFParser()
    structure = parser.get_structure(
        structure_id="1A02", filename="1a02-assembly1.cif"
    )

    # build all nucleic acids
    builder = NABuilder()
    na_list = builder.build_nucleic_acids(entity=structure)
    na_list

.. code-block:: console

    [<NucleicAcid chain='A' type='DNA' start=4001 end=4020>,
     <NucleicAcid chain='B' type='DNA' start=5001 end=5020>]

Every nucleic acid is like a Python list:

.. code-block:: python

    na = na_list[0]
    na[:5]

.. code-block:: console

    [<Residue DT het=  resseq=4001 icode= >,
     <Residue DT het=  resseq=4002 icode= >,
     <Residue DG het=  resseq=4003 icode= >,
     <Residue DG het=  resseq=4004 icode= >,
     <Residue DA het=  resseq=4005 icode= >]

PDBNucleicAcids can get a nucleic acid sequence:

.. code-block:: python

    na.get_seq()

.. code-block:: console

    Seq('TTGGAAAATTTGTTTCATAG')

PDBNucleicAcids can also get a nucleic acid chain id, nucleic acid type and
all atoms:

.. code-block:: python

    print(na.get_chain_id(), na.get_na_type())
    print(na.get_atoms()[:5])

.. code-block:: console

    A DNA
    [<Atom O5'>, <Atom C5'>, <Atom C4'>, <Atom O4'>, <Atom C3'>]


Build All Double-Stranded Nucleic Acids
---------------------------------------

PDBNucleicAcids can parse all double-stranded nucleic acids in a Biopython
structure.

.. code-block:: python

    from PDBNucleicAcids.NucleicAcid import DSNABuilder

    builder = DSNABuilder()
    dsna_list = builder.build_double_strands(structure)
    dsna_list

.. code-block:: console

    [<DoubleStrandNucleicAcid type='dsDNA' i-th strand='A'
     j-th strand='B' length=17>]


Get All Base-Pairs
------------------

PDBNucleicAcids can extract all base-pairs object in a double-stranded nucleic
acid. Double straded nucleic acids are like a list of base-pairs:

.. code-block:: python

    dsna = dsna_list[0]
    dsna[:5]

.. code-block:: console

    [<BasePair i_res=DG j_res=DC>,
     <BasePair i_res=DG j_res=DC>,
     <BasePair i_res=DA j_res=DT>,
     <BasePair i_res=DA j_res=DT>,
     <BasePair i_res=DA j_res=DT>]

PDBNucleicAcids can extract all base-pairs data in a double-stranded
nucleic acid.

.. code-block:: python

    dsna = dsna_list[0]
    df = dsna.as_dataframe()
    df.head()

.. code-block:: console

      i_chain_id  i_residue_index  ... j_residue_index j_chain_id
    0          A             4003  ...            5020          B
    1          A             4004  ...            5019          B
    2          A             4005  ...            5018          B
    3          A             4006  ...            5017          B
    4          A             4007  ...            5016          B


Search Individual Pair Bases
----------------------------

PDBNucleicAcids can search for paired nucleotide, given an input nucleotide.

.. code-block:: python

    from PDBNucleicAcids.NucleicAcid import search_paired_base

    # input nucleotide
    base = structure[0]["A"][4003]  # DG

    # search for paired nucleotide
    paired_base = search_paired_base(base)
    paired_base

.. code-block:: console

    <Residue DC het=  resseq=5020 icode= >

PDBNucleicAcids will recognize unpaired bases.

.. code-block:: python

    # input nucleotide
    base = structure[0]["A"][4001]  # DT

    # search for paired nucleotide
    paired_base = search_paired_base(base)
    print(paired_base)

.. code-block:: console

    None


DNA-RNA Complexes
-----------------

PDBNucleicAcids base-pairing can be used for DNA-RNA base-pairs.

.. code-block:: python

    from Bio.PDB.PDBList import PDBList
    from Bio.PDB.MMCIFParser import MMCIFParser
    from PDBNucleicAcids.NucleicAcid import search_paired_base

    # retrive file from PDB using Biopython
    pdbl = PDBList()
    pdbl.retrieve_assembly_file(pdb_code="9K7R", assembly_num=1, pdir=".")

    # parse and build structure with Biopython
    parser = MMCIFParser()
    structure = parser.get_structure(
        structure_id="9K7R", filename="9k7r-assembly1.cif"
    )

    # input nucleotide
    base = structure[0]["B"][8]  # DT

    # search for paired nucleotide
    paired_base = search_paired_base(base)

    # paired base is RNA base
    paired_base

.. code-block:: console

    <Residue A het=  resseq=2 icode= >


Custom Rules for Base-Pairing
-----------------------------

PDBNucleicAcids base-pairing can be expanded, by changing parameters used in the
base-pairing rules.

.. code-block:: python

    from PDBNucleicAcids.BasePairRules import dsDNAWatsonCrickBasePairRules

    parser = MMCIFParser()
    structure = parser.get_structure(
        structure_id="1A02", filename="1a02-assembly1.cif"
    )

    # custom base pairing rules
    my_rules = dsDNAWatsonCrickBasePairRules(
        max_distance = 3.5,
        max_angle = 60,
        max_stagger = 2.0,
    )

    # input nucleotide
    base = structure[0]["A"][4003]  # DG

    # search for paired nucleotide
    paired_base = search_paired_base(base, pairing_rules=my_rules)


PDBNucleicAcids base-pairing can be expanded even further by creating your own
base-pairing rules.


.. code-block:: python

    from PDBNucleicAcids.BasePairRules import WatsonCrickBasePairRules

    parser = MMCIFParser()
    structure = parser.get_structure(
        structure_id="1A02", filename="1a02-assembly1.cif"
    )

    # input nucleotide
    base = structure[0]["A"][1]  # G

    # search for paired nucleotide with default rules
    pairing_rules = WatsonCrickBasePairRules()
    paired_base = search_paired_base(base, pairing_rules=pairing_rules)
    # this returns None because it binds a non-standard DNA base: 5CM

    # to circumvent this we can code our own rules
    class MyRules(WatsonCrickBasePairRules):
        def __init__(self):
            super().__init__()

            self.complementary_pairs += [("5CM", "G"), ("G", "5CM")]

            self.pyrimidines.append("5CM")

            self.accepted_nucleotides.append("5CM")

    # search for paired nucleotide with custom base pairing rules
    pairing_rules = MyRules()
    paired_base = search_paired_base(base, pairing_rules=pairing_rules)
    paired_base

.. code-block:: console

    <Residue 5CM het=H_5CM resseq=9 icode= >

Better extensions may be implemented in a later version:

.. code-block:: python

    class MyRules(WatsonCrickBasePairRules):
        """Class for extending base-pairing rules based on our data."""

        def __init__(self):
            super().__init__()

            self.complementary_pairs += [
                ("DU", "DA"),
                ("DA", "DU"),
                ("DG", "DOC"),
                ("DOC", "DG"),
                ("DG", "CBR"),
                ("CBR", "DG"),
                ("DG", "C7S"),
                ("C7S", "DG"),
                ("DG", "C7R"),
                ("C7R", "DG"),
                ("DG", "C38"),
                ("C38", "DG"),
                ("DG", "C2S"),
                ("C2S", "DG"),
                ("DA", "BRU"),
                ("BRU", "DA"),
                ("DT", "AS"),
                ("AS", "DT"),
                ("DA", "PST"),
                ("PST", "DA"),
                ("DA", "5IU"),
                ("5IU", "DA"),
                ("DG", "5HC"),
                ("5HC", "DG"),
                ("DG", "5FC"),
                ("5FC", "DG"),
                ("DG", "5CM"),
                ("5CM", "DG"),
                ("DT", "2PR"),
                ("2PR", "DT"),
                ("DG", "1CC"),
                ("1CC", "DG"),
                ("DT", "1AP"),
                ("1AP", "DT"),
            ]

            pyrimidines = [
                "DOC",
                "CBR",
                "C7S",
                "C7R",
                "C38",
                "C2S",
                "5HC",
                "5FC",
                "5CM",
                "1CC",
                "BRU",
                "PST",
                "5IU",
            ]

            purines = ["AS", "2PR", "1AP"]

            self.pyrimidines.extend(pyrimidines)

            self.purines.extend(purines)

            self.accepted_nucleotides.extend(pyrimidines + purines)


Limitations
-----------

PDBNucleicAcids doesn't support yet recognition of flipped bases,
gaps and nicks.