kinoml.core.sequences¶

Sequence-like objects to build MolecularComponents and others.

Module Contents¶

kinoml.core.sequences.logger¶

class kinoml.core.sequences.Biosequence(sequence='', name='', metadata=None, **kwargs)¶

Bases: object

Base class for string representations of biological polymers (nucleic acids, peptides, proteins…).

Parameters:

sequence (str, default="") – The sequence in one-letter codes.
name (str, default="") – The sequence name.
metadata (dict or None, default=None) – Additional data as a dictionary.

ALPHABET¶

_sequence = ''¶

name = ''¶

metadata¶

property sequence¶

_query_sequence_sources()¶: Query available sources for sequence details. Overwrite method in subclasses to fetch data.

substitute(substitution)¶

Given XYYYZ, substitute element X at position YYY with Z, e.g. C1156Y.

Parameters:: substitution (str) – Substitution to apply. It must be formatted as [existing element][1-indexed position][new element].

Examples

>>> s = Biosequence(sequence="ABCD")
>>> s.sequence
"ABCD"
>>> s.substitute("B2F")
>>> s.sequence
"AFCD"

delete(first, last, insert='')¶

Delete all elements between first and last positions including bounds. Optionally, provide an additional insert that shell be placed at the position of the deletion.

Parameters:

first (int) – First residue to delete (1-indexed).
last (int) – Last residue to delete (1-indexed).
insert (str, default="") – Sequence that should be placed at the position of the deletion.

Examples

>>> s = Biosequence(sequence="ABCD")
>>> s.sequence
"ABCD"
>>> s.delete(3,3, insert="GH")
>>> s.sequence
"ABGHD"

insert(position, insert)¶

Insert a sequence at the given position.

Parameters:

position (int) – Position (1-indexed) to place the insertion.
insert (str) – The sequence of the insertion.

Examples

>>> s = Biosequence(sequence="ABCD")
>>> s.sequence
"ABCD"
>>> s.insert(4, insert="EF")
>>> s.sequence
"ABCEFD"

class kinoml.core.sequences.AminoAcidSequence(uniprot_id='', ncbi_id='', sequence='', name='', metadata=None, **kwargs)¶

Bases: Biosequence

Biosequence for amino acid sequences.

Parameters:

uniprot_id (str or None, default=None) – The UniProt ID.
ncbi_id (str or None, default=None) – The NCBI ID.
sequence (str, default="") – The amino acid sequence in one-letter codes.
name (str, default="") – The sequence name.
metadata (dict or None, default=None) – Additional data as a dictionary.

Examples

Amino acid sequences can be created by providing the sequence manually or by fetching from e.g. UniProt:

>>> alatripeptide = AminoAcidSequence(sequence="AAA", name="alatripeptide")
>>> alatripeptide.sequence
"AAA"
>>> abl1 = AminoAcidSequence(uniprot_id="P00519", name="ABL1")
>>> abl1.sequence[:5]
"MLEIC"

Fetched sequences can be altered by providing information via metadata[“mutations”], i.e.:

insertions - formatted like “ins123AGA”
deletions - formatted like “del12-15P” (the P stands for a proline insert (optional))
substitutions - formatted like “T315A”

>>> abl1 = AminoAcidSequence(
>>>     uniprot_id="P00519", name="ABL1", metadata={"mutations": "T315A"}
>>> )

Multiple mutations can be added sequentially in a single string:

>>> abl1 = AminoAcidSequence(
>>>     uniprot_id="P00519", name="ABL1", metadata={"mutations": "T315A del320-22P"}
>>> )

An artificial contruct only consisting of a part of the sequence can be specified via metadata[“construct_range”]:

>>> abl1 = AminoAcidSequence(
>>>     uniprot_id="P00519",
>>>     name="ABL1",
>>>     metadata={"mutations": "T315A", "construct_range": "229-512"}
>>> )

ALPHABET = 'ACDEFGHIKLMNPQRSTVWY'¶

uniprot_id = ''¶

ncbi_id = ''¶

_query_sequence_sources()¶: Query available sources for sequence details. Add additional methods below to allow fetching from other sources. Perform mutations etc if given via metadata.

_query_uniprot()¶: Fetch the amino acid sequence from UniProt.

_query_ncbi()¶: Fetch the amino acid sequence from NCBI.

static ncbi_to_uniprot(ncbi_id)¶

Convert an NCBI protein accession to the corresponding UniProt ID.

Parameters:: ncbi_id (str) – The NCBI protein accession.
Returns:: The corresponding UniProt ID, empty string if not successful.
Return type:: str

class kinoml.core.sequences.DNASequence(sequence='', name='', metadata=None, **kwargs)¶

Bases: Biosequence

Biosequence that only allows DNA bases.

ALPHABET = 'ATCG'¶

class kinoml.core.sequences.RNASequence(sequence='', name='', metadata=None, **kwargs)¶

Bases: Biosequence

Biosequence that only allows RNA bases.

ALPHABET = 'AUCG'¶