kinoml.core.sequences¶
Sequence-like objects to build MolecularComponents and others.
Module Contents¶
- kinoml.core.sequences.logger¶
- class kinoml.core.sequences.Biosequence(sequence='', name='', metadata=None, **kwargs)¶
Bases:
object
Base class for string representations of biological polymers (nucleic acids, peptides, proteins…).
- Parameters:
sequence (str, default="") – The sequence in one-letter codes.
name (str, default="") – The sequence name.
metadata (dict or None, default=None) – Additional data as a dictionary.
- ALPHABET¶
- _sequence = ''¶
- name = ''¶
- metadata¶
- property sequence¶
- _query_sequence_sources()¶
Query available sources for sequence details. Overwrite method in subclasses to fetch data.
- substitute(substitution)¶
Given
XYYYZ
, substitute elementX
at positionYYY
withZ
, e.g. C1156Y.- Parameters:
substitution (str) – Substitution to apply. It must be formatted as
[existing element][1-indexed position][new element]
.
Examples
>>> s = Biosequence(sequence="ABCD") >>> s.sequence "ABCD" >>> s.substitute("B2F") >>> s.sequence "AFCD"
- delete(first, last, insert='')¶
Delete all elements between first and last positions including bounds. Optionally, provide an additional insert that shell be placed at the position of the deletion.
- Parameters:
first (int) – First residue to delete (1-indexed).
last (int) – Last residue to delete (1-indexed).
insert (str, default="") – Sequence that should be placed at the position of the deletion.
Examples
>>> s = Biosequence(sequence="ABCD") >>> s.sequence "ABCD" >>> s.delete(3,3, insert="GH") >>> s.sequence "ABGHD"
- insert(position, insert)¶
Insert a sequence at the given position.
- Parameters:
position (int) – Position (1-indexed) to place the insertion.
insert (str) – The sequence of the insertion.
Examples
>>> s = Biosequence(sequence="ABCD") >>> s.sequence "ABCD" >>> s.insert(4, insert="EF") >>> s.sequence "ABCEFD"
- class kinoml.core.sequences.AminoAcidSequence(uniprot_id='', ncbi_id='', sequence='', name='', metadata=None, **kwargs)¶
Bases:
Biosequence
Biosequence for amino acid sequences.
- Parameters:
uniprot_id (str or None, default=None) – The UniProt ID.
ncbi_id (str or None, default=None) – The NCBI ID.
sequence (str, default="") – The amino acid sequence in one-letter codes.
name (str, default="") – The sequence name.
metadata (dict or None, default=None) – Additional data as a dictionary.
Examples
Amino acid sequences can be created by providing the sequence manually or by fetching from e.g. UniProt:
>>> alatripeptide = AminoAcidSequence(sequence="AAA", name="alatripeptide") >>> alatripeptide.sequence "AAA" >>> abl1 = AminoAcidSequence(uniprot_id="P00519", name="ABL1") >>> abl1.sequence[:5] "MLEIC"
- Fetched sequences can be altered by providing information via metadata[“mutations”], i.e.:
insertions - formatted like “ins123AGA”
deletions - formatted like “del12-15P” (the P stands for a proline insert (optional))
substitutions - formatted like “T315A”
>>> abl1 = AminoAcidSequence( >>> uniprot_id="P00519", name="ABL1", metadata={"mutations": "T315A"} >>> )
Multiple mutations can be added sequentially in a single string:
>>> abl1 = AminoAcidSequence( >>> uniprot_id="P00519", name="ABL1", metadata={"mutations": "T315A del320-22P"} >>> )
An artificial contruct only consisting of a part of the sequence can be specified via metadata[“construct_range”]:
>>> abl1 = AminoAcidSequence( >>> uniprot_id="P00519", >>> name="ABL1", >>> metadata={"mutations": "T315A", "construct_range": "229-512"} >>> )
- ALPHABET = 'ACDEFGHIKLMNPQRSTVWY'¶
- uniprot_id = ''¶
- ncbi_id = ''¶
- _query_sequence_sources()¶
Query available sources for sequence details. Add additional methods below to allow fetching from other sources. Perform mutations etc if given via metadata.
- _query_uniprot()¶
Fetch the amino acid sequence from UniProt.
- _query_ncbi()¶
Fetch the amino acid sequence from NCBI.
- static ncbi_to_uniprot(ncbi_id)¶
Convert an NCBI protein accession to the corresponding UniProt ID.
- Parameters:
ncbi_id (str) – The NCBI protein accession.
- Returns:
The corresponding UniProt ID, empty string if not successful.
- Return type:
str
- class kinoml.core.sequences.DNASequence(sequence='', name='', metadata=None, **kwargs)¶
Bases:
Biosequence
Biosequence that only allows DNA bases.
- ALPHABET = 'ATCG'¶
- class kinoml.core.sequences.RNASequence(sequence='', name='', metadata=None, **kwargs)¶
Bases:
Biosequence
Biosequence that only allows RNA bases.
- ALPHABET = 'AUCG'¶