Skip to content

Cas9 nuclease

Predict SpCas9 activity

DeepSpCas9 is a prediction model developed to evaluate to indel frequency introduced by sgRNAs at specific target sites mediated by the SpCas9 (Kim et al. SciAdv 2019). The model was developed on tensorflow (version >= 2.6). Any dependent packages will be installed along with the GenET package.

from genet.predict import SpCas9

# Put the target context (30bp) that you want to find Cas9 activity in the list.
# Input seq: 4bp 5' context + 20 guide + 3bp PAM + 3bp 3' context

spcas = SpCas9()

list_target = [
                'TCACCTTCGTTTTTTTCCTTCTGCAGGAGG',
                'CCTTCGTTTTTTTCCTTCTGCAGGAGGACA',
                'CTTTCAAGAACTCTTCCACCTCCATGGTGT',
                ]

df_out = spcas.predict(list_target)

>>> df_out
Target Spacer SpCas9
0 TCACCTTCGTTTTTTTCCTTCTGCAGGAGG CTTCGTTTTTTTCCTTCTGC 2.801172
1 CCTTCGTTTTTTTCCTTCTGCAGGAGGACA CGTTTTTTTCCTTCTGCAGG 2.253288
2 CTTTCAAGAACTCTTCCACCTCCATGGTGT CAAGAACTCTTCCACCTCCA 53.43182

Alternatively, you can identify all possible SpCas9 target sites within an extensive gene sequence and obtain predictive scores.

from genet.predict import SpCas9

# Put the whole sequence context that you want to find Cas9 target site.
gene = 'ttcagctctacgtctcctccgagagccgcttcaacaccctggccgagttggttcatcatcattcaacggtggccgacgggctcatcaccacgctccattatccagccccaaagcgcaacaagcccactgtctatggtgtgtcccccaactacgacaagtgggagatggaacgcacggacatcaccatgaagcacaagctgggcgggggccagtacggggaggtgtacgagggcgtgtggaagaaatacagcctgacggtggccgtgaagaccttgaaggtagg'

spcas = SpCas9()
df_out = spcas.search(gene)

>>> df_out.head()

Target Spacer Strand Start End SpCas9
0 CCTCCGAGAGCCGCTTCAACACCCTGGCCG CGAGAGCCGCTTCAACACCC + 15 45 67.39446
1 GCCGCTTCAACACCCTGGCCGAGTTGGTTC CTTCAACACCCTGGCCGAGT + 24 54 27.06508
2 CCGAGTTGGTTCATCATCATTCAACGGTGG GTTGGTTCATCATCATTCAA + 42 72 34.11356
3 AGTTGGTTCATCATCATTCAACGGTGGCCG GGTTCATCATCATTCAACGG + 45 75 76.43662
4 TCATCATCATTCAACGGTGGCCGACGGGCT CATCATTCAACGGTGGCCGA + 52 82 29.63767

Predict SpCas9variants activity

DeepSpCas9 is a prediction model developed to evaluate to indel frequency introduced by sgRNAs at specific target sites mediated by the SpCas9 PAM variants (Kim et al. Nat.Biotechnol. 2020). The model was developed on tensorflow (version >= 2.6). Any dependent packages will be installed along with the GenET package.

from genet.predict import CasVariant

# Available Cas9 variants: 
# SpCas9-NG, SpCas9-NRCH, SpCas9-NRRH, SpCas9-NRTH, SpCas9-Sc++, SpCas9-SpCas9, SpCas9-SpG, SpCas9-SpRY, SpCas9-VRQR
cas_ng = CasVariant('SpCas9-NG')

# Put the target context (30bp) that you want to find Cas9 activity in the list.
# Input seq: 4bp 5' context + 20 guide + 3bp PAM + 3bp 3' context

list_target30 = [
                'TCACCTTCGTTTTTTTCCTTCTGCAGGAGG',
                'CCTTCGTTTTTTTCCTTCTGCAGGAGGACA',
                'CTTTCAAGAACTCTTCCACCTCCATGGTGT',
                ]

df_out = cas_ng.predict(list_target30)

>>> df_out
Target Spacer SpCas9-NG
0 TCACCTTCGTTTTTTTCCTTCTGCAGGAGG CTTCGTTTTTTTCCTTCTGC 0.618299
1 CCTTCGTTTTTTTCCTTCTGCAGGAGGACA CGTTTTTTTCCTTCTGCAGG 1.134845
2 CTTTCAAGAACTCTTCCACCTCCATGGTGT CAAGAACTCTTCCACCTCCA 36.74358

Similarly, in CasVariants, you can also utilize the 'search' method. It automatically identifies targets corresponding to each PAM variant and calculates predictive scores. For instance, SpCas9-NRCH identifies NG+NA+NNG PAMs.

from genet.predict import CasVariant

# Put the whole sequence context that you want to find Cas9Variants target site.
gene = 'ttcagctctacgtctcctccgagagccgcttcaacaccctggccgagttggttcatcatcattcaacggtggccgacgggctcatcaccacgctccattatccagccccaaagcgcaacaagcccactgtctatggtgtgtcccccaactacgacaagtgggagatggaacgcacggacatcaccatgaagcacaagctgggcgggggccagtacggggaggtgtacgagggcgtgtggaagaaatacagcctgacggtggccgtgaagaccttgaaggtagg'


cas_ng = CasVariant('SpCas9-NRCH')
df_out = cas_ng.search(gene)

>>> df_out.head()
Target Spacer Strand Start End SpCas9-NRCH
0 TCAGCTCTACGTCTCCTCCGAGAGCCGCTT CTCTACGTCTCCTCCGAGAG + 1 31 26.43327
1 CAGCTCTACGTCTCCTCCGAGAGCCGCTTC TCTACGTCTCCTCCGAGAGC + 2 32 40.16034
2 CTACGTCTCCTCCGAGAGCCGCTTCAACAC GTCTCCTCCGAGAGCCGCTT + 7 37 47.06001
3 TACGTCTCCTCCGAGAGCCGCTTCAACACC TCTCCTCCGAGAGCCGCTTC + 8 38 20.26012
4 CGTCTCCTCCGAGAGCCGCTTCAACACCCT TCCTCCGAGAGCCGCTTCAA + 10 40 45.58047