utils

KINOMEScanMapper

This helper class helps retrieve sequence information out of the raw data provided by DiscoverX, which only offers NCBI accessions, mutations and construct limits. We process this to obtain a FASTA sequence that can be ingested in our pipelines.

Since this relies on online queries, it will cache the results to disk by default.

Parameters

Name Type Description Default
raw_datasheet Union[~AnyStr, pathlib.Path] Raw CSV file with the DiscoverX information PosixPath('/home/runner/work/kinoml/kinoml/kinoml/data/kinomescan/DiscoverX_489_Kinase_Assay_Construct_Information.csv')
use_cache bool Whether to read the data from cache if possible. Set to False to ignore existing caches and rewrite them. True

accession_for_name(self, name)

Show source code in kinomescan/utils.py
72
73
74
75
76
    def accession_for_name(self, name: AnyStr):
        """
        Given a kinase name, return the corresponding NCBI accession
        """
        return self._data.loc[name, "accession"]

Given a kinase name, return the corresponding NCBI accession

mutations_for_name(self, name)

Show source code in kinomescan/utils.py
78
79
80
81
82
    def mutations_for_name(self, name: AnyStr):
        """
        Given a kinase name, return the corresponding mutations
        """
        return self._data.loc[name, "mutations"]

Given a kinase name, return the corresponding mutations

sequence_for_accession(self, accession)

Show source code in kinomescan/utils.py
90
91
92
93
94
    def sequence_for_accession(self, accession: AnyStr):
        """
        Given a NCBI identifier, return the corresponding FASTA sequence
        """
        return self._data[self._data.accession == accession].sequence.values

Given a NCBI identifier, return the corresponding FASTA sequence

sequence_for_name(self, name)

Show source code in kinomescan/utils.py
66
67
68
69
70
    def sequence_for_name(self, name: AnyStr):
        """
        Given a kinase name, return the corresponding FASTA sequence
        """
        return self._data.loc[name, "sequence"]

Given a kinase name, return the corresponding FASTA sequence

start_stop_for_name(self, name)

Show source code in kinomescan/utils.py
84
85
86
87
88
    def start_stop_for_name(self, name: AnyStr):
        """
        Given a kinase name, return the corresponding start&stop positions
        """
        return self._data.loc[name, "start_stop"]

Given a kinase name, return the corresponding start&stop positions


Last update: April 24, 2020