kinoml.core.sequences

Sequence-like objects to build MolecularComponents and others.

Module Contents

kinoml.core.sequences.logger
class kinoml.core.sequences.Biosequence(sequence='', name='', metadata=None, **kwargs)

Bases: object

Base class for string representations of biological polymers (nucleic acids, peptides, proteins…).

Parameters
  • sequence (str, default="") – The sequence in one-letter codes.

  • name (str, default="") – The sequence name.

  • metadata (dict or None, default=None) – Additional data as a dictionary.

property sequence
ALPHABET
sequence()
_query_sequence_sources()

Query available sources for sequence details. Overwrite method in subclasses to fetch data.

substitute(substitution)

Given XYYYZ, substitute element X at position YYY with Z, e.g. C1156Y.

Parameters

substitution (str) – Substitution to apply. It must be formatted as [existing element][1-indexed position][new element].

Examples

>>> s = Biosequence(sequence="ABCD")
>>> s.sequence
"ABCD"
>>> s.substitute("B2F")
>>> s.sequence
"AFCD"
delete(first, last, insert='')

Delete all elements between first and last positions including bounds. Optionally, provide an additional insert that shell be placed at the position of the deletion.

Parameters
  • first (int) – First residue to delete (1-indexed).

  • last (int) – Last residue to delete (1-indexed).

  • insert (str, default="") – Sequence that should be placed at the position of the deletion.

Examples

>>> s = Biosequence(sequence="ABCD")
>>> s.sequence
"ABCD"
>>> s.delete(3,3, insert="GH")
>>> s.sequence
"ABGHD"
insert(position, insert)

Insert a sequence at the given position.

Parameters
  • position (int) – Position (1-indexed) to place the insertion.

  • insert (str) – The sequence of the insertion.

Examples

>>> s = Biosequence(sequence="ABCD")
>>> s.sequence
"ABCD"
>>> s.insert(4, insert="EF")
>>> s.sequence
"ABCEFD"
class kinoml.core.sequences.AminoAcidSequence(uniprot_id='', ncbi_id='', sequence='', name='', metadata=None, **kwargs)

Bases: Biosequence

Biosequence for amino acid sequences.

Parameters
  • uniprot_id (str or None, default=None) – The UniProt ID.

  • ncbi_id (str or None, default=None) – The NCBI ID.

  • sequence (str, default="") – The amino acid sequence in one-letter codes.

  • name (str, default="") – The sequence name.

  • metadata (dict or None, default=None) – Additional data as a dictionary.

Examples

Amino acid sequences can be created by providing the sequence manually or by fetching from e.g. UniProt:

>>> alatripeptide = AminoAcidSequence(sequence="AAA", name="alatripeptide")
>>> alatripeptide.sequence
"AAA"
>>> abl1 = AminoAcidSequence(uniprot_id="P00519", name="ABL1")
>>> abl1.sequence[:5]
"MLEIC"
Fetched sequences can be altered by providing information via metadata[“mutations”], i.e.:
  • insertions - formatted like “ins123AGA”

  • deletions - formatted like “del12-15P” (the P stands for a proline insert (optional))

  • substitutions - formatted like “T315A”

>>> abl1 = AminoAcidSequence(
>>>     uniprot_id="P00519", name="ABL1", metadata={"mutations": "T315A"}
>>> )

Multiple mutations can be added sequentially in a single string:

>>> abl1 = AminoAcidSequence(
>>>     uniprot_id="P00519", name="ABL1", metadata={"mutations": "T315A del320-22P"}
>>> )

An artificial contruct only consisting of a part of the sequence can be specified via metadata[“construct_range”]:

>>> abl1 = AminoAcidSequence(
>>>     uniprot_id="P00519",
>>>     name="ABL1",
>>>     metadata={"mutations": "T315A", "construct_range": "229-512"}
>>> )
ALPHABET = 'ACDEFGHIKLMNPQRSTVWY'
_query_sequence_sources()

Query available sources for sequence details. Add additional methods below to allow fetching from other sources. Perform mutations etc if given via metadata.

_query_uniprot()

Fetch the amino acid sequence from UniProt.

_query_ncbi()

Fetch the amino acid sequence from NCBI.

static ncbi_to_uniprot(ncbi_id)

Convert an NCBI protein accession to the corresponding UniProt ID.

Parameters

ncbi_id (str) – The NCBI protein accession.

Returns

The corresponding UniProt ID, empty string if not successful.

Return type

str

class kinoml.core.sequences.DNASequence(sequence='', name='', metadata=None, **kwargs)

Bases: Biosequence

Biosequence that only allows DNA bases.

ALPHABET = 'ATCG'
class kinoml.core.sequences.RNASequence(sequence='', name='', metadata=None, **kwargs)

Bases: Biosequence

Biosequence that only allows RNA bases.

ALPHABET = 'AUCG'