=====
Usage
=====

To use PDBNucleicAcids in a project:

.. code-block:: python

    import PDBNucleicAcids

You can parse single stranded and double stranded nucleic acids.

.. code-block:: python

    from Bio.PDB.PDBList import PDBList
    from Bio.PDB.MMCIFParser import MMCIFParser
    from PDBNucleicAcids.NucleicAcid import DSNABuilder
    
    # retrive file from PDB using Biopython
    pdbl = PDBList()
    pdbl.retrieve_pdb_file(pdb_code="10MH", pdir=".")
    pdbl.retrieve_assembly_file(pdb_code="10MH", assembly_num=1, pdir=".")
    # ... or else use your own
    
    # parse and build structure with Biopython
    parser = MMCIFParser()
    structure = parser.get_structure(
         structure_id="10MH", filename="10mh-assembly1.cif"
    )
    
    # extract DataFrame with basepairs data
    builder = DSNABuilder()
    dsna_list = builder.build_double_strands(structure)
    
    # take the first double strand nucleic acid as an example
    dsna = dsna_list[0]
    dsna.get_dataframe()

.. code-block:: console

        i_chain_id  i_residue_index i_residue_name j_residue_name  j_residue_index j_chain_id
    0          B              402             DC             DG              433          C
    1          B              403             DC             DG              432          C
    2          B              404             DA             DT              431          C
    3          B              405             DT             DA              430          C
    4          B              406             DG             DC              429          C

In this case we have a gap in the basepairs at ``i_residue_index`` 407 and 408.
This results in two distinct paired segments of dsDNA.

In reality only 408 is a mispair. 407 is a non-standard 5CM-Guanine pair.
It's ignored by PDBNucleicAcids because it currently supports only standard Watson-Crick
basepairs.