Skip to content

DeepPrime

Predict Prime editing efficiency

DeepPrime is a prediction model for evaluating prime editing guideRNAs (pegRNAs) that target specific target sites for prime editing (Yu et al. Cell 2023). DeepSpCas9 prediction score is calculated simultaneously and requires tensorflow (version >=2.6). DeepPrime was developed on pytorch.

How to Use DeepPrime

To use DeepPrime, you need to prepare a DNA sequence containing the intended prime editing and surrounding context information as input. Intended prime editing can only involve 1-3nt substitution, insertion, or deletion, and it is not possible to introduce multiple edit types in combination. The position where prime editing is introduced is indicated in parentheses, and the original and prime-edited sequences are separated using '/'.

Some example inputs are as follows:

# Example 1: 1bp substitution (T to A)
input_seq = 'CTCACGTGAGCTCTTTGAGCTTGCCTGTCTCTGTGGGCTGAAGGCTGTTCCCTGTTTCCT(T/A)CAGCTCTACGTCTCCTCCGAGAGCCGCTTCAACACCCTGGCCGAGTTGGTTCATCATCAT'

# Example 2: 3bp insertion (CTT insertion)
input_seq = 'CTCACGTGAGCTCTTTGAGCTTGCCTGTCTCTGTGGGCTGAAGGCTGTTCCCTGTTTCCT(/CTT)TCAGCTCTACGTCTCCTCCGAGAGCCGCTTCAACACCCTGGCCGAGTTGGTTCATCATCAT'

# Example 3: 2bp deletion (TC deletion)
input_seq = 'CTCACGTGAGCTCTTTGAGCTTGCCTGTCTCTGTGGGCTGAAGGCTGTTCCCTGTTTCCT(TC/)AGCTCTACGTCTCCTCCGAGAGCCGCTTCAACACCCTGGCCGAGTTGGTTCATCATCAT'

If you have prepared the input as described above, you can use DeepPrime as follows. When you input the target sequence and editing informations into DeepPrime and run it, it designs all possible types of pegRNAs for the given sequence and automatically calculates their corresponding biofeatures. You can check the calculated biofeatures using .features.

from genet.predict import DeepPrime

input_seq = 'CTCACGTGAGCTCTTTGAGCTTGCCTGTCTCTGTGGGCTGAAGGCTGTTCCCTGTTTCCT(T/A)CAGCTCTACGTCTCCTCCGAGAGCCGCTTCAACACCCTGGCCGAGTTGGTTCATCATCAT'

pegrna = DeepPrime(input_seq)

# check designed pegRNAs
>>> pegrna.features.head()
ID Spacer RT-PBS PBS_len RTT_len RT-PBS_len Edit_pos Edit_len RHA_len Target ... deltaTm_Tm4-Tm2 GC_count_PBS GC_count_RTT GC_count_RT-PBS GC_contents_PBS GC_contents_RTT GC_contents_RT-PBS MFE_RT-PBS-polyT MFE_Spacer DeepSpCas9_score
0 SampleName AAGACAACACCCTTGCCTTG CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG 7 35 42 34 1 1 ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... ... -340.105 5 16 21 71.42857 45.71429 50 -10.4 -0.6 45.96754
1 SampleName AAGACAACACCCTTGCCTTG CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG 8 35 43 34 1 1 ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... ... -340.105 6 16 22 75 45.71429 51.16279 -10.4 -0.6 45.96754
2 SampleName AAGACAACACCCTTGCCTTG CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT 9 35 44 34 1 1 ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... ... -340.105 6 16 22 66.66667 45.71429 50 -10.4 -0.6 45.96754
3 SampleName AAGACAACACCCTTGCCTTG CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG 10 35 45 34 1 1 ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... ... -340.105 7 16 23 70 45.71429 51.11111 -10.4 -0.6 45.96754
4 SampleName AAGACAACACCCTTGCCTTG CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT 11 35 46 34 1 1 ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... ... -340.105 7 16 23 63.63636 45.71429 50 -10.4 -0.6 45.96754

Next, select model PE system and run DeepPrime

pe2max_output = pegrna.predict(pe_system='PE2max', cell_type='HEK293T')

>>> pe2max_output.head()

ID PE2max_score Spacer RT-PBS PBS_len RTT_len RT-PBS_len Edit_pos Edit_len RHA_len Target
0 SampleName 0.904387 AAGACAACACCCTTGCCTTG CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG 7 35 42 34 1 1 ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...
1 SampleName 2.375938 AAGACAACACCCTTGCCTTG CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG 8 35 43 34 1 1 ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...
2 SampleName 2.61238 AAGACAACACCCTTGCCTTG CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT 9 35 44 34 1 1 ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...
3 SampleName 3.641537 AAGACAACACCCTTGCCTTG CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG 10 35 45 34 1 1 ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...
4 SampleName 3.768321 AAGACAACACCCTTGCCTTG CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT 11 35 46 34 1 1 ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...

Predicting efficiencies of existing pegRNAs

When the target, PBS, and RT template sequences are accurately inputted, DeepPrimeGuideRNA predicts the DeepPrime score of the corresponding pegRNA. For example, let's assume we have the following target and pegRNA:

prime_editing_complex

To obtain the DeepPrime score of the pegRNA above, you can execute the code as follow; similar to .predict method in DeepPrime, you can specify pe_system and cell_type.

from genet.predict import DeepPrimeGuideRNA

target    = 'TTTAAGGTTTCAGTTGACATTTGCAGGTTATAGTTCTTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCA'
pbs       = 'AATGTCAAC'
rtt       = 'AGAAACTGAGACGAACTATAACCTGCA'
edit_len  = 1
edit_pos  = 16
edit_type = 'sub'

pegrna = DeepPrimeGuideRNA('pegRNA_test', target=target, pbs=pbs, rtt=rtt, 
                           edit_len=edit_len, edit_pos=edit_pos, edit_type=edit_type)

pe2max_score = pegrna.predict('PE2max')
print(pe2max_score) # 8.23717212677002

The inputs for DeepPrimeGuideRNA are configured as follows:

Input Type Description
sID str Name of sample or pegRNA
target str 4nt additional context sequence must be included in the 5' direction. The Protospacer (region to which the guide sequence is attached) is oriented in the 5'->3' direction and the target sequence must be 74nt in length.
pbs str The PBS sequence from the pegRNA. Both T (DNA) and U (RNA) forms are acceptable.
rtt str The RT template sequence from the pegRNA. Both T (DNA) and U (RNA) forms are acceptable.
edit_len int Select one of 1, 2, or 3 according to the intended prime editing.
edit_pos int Select one from 1-40 according to the intended prime editing.
edit_type str Select one from 'sub', 'ins', 'del' according to the intended prime editing.

Current available DeepPrime models:

Cell type PE system Model
HEK293T PE2 DeepPrime_base
HEK293T NRCH_PE2 DeepPrime-FT: HEK293T, NRCH-PE2 with Optimized scaffold
HEK293T NRCH_PE2max DeepPrime-FT: HEK293T, NRCH-PE2max with Optimized scaffold
HEK293T PE2 DeepPrime-FT: HEK293T, PE2 with Conventional scaffold
HEK293T PE2max-e DeepPrime-FT: HEK293T, PE2max with Optimized scaffold and epegRNA
HEK293T PE2max DeepPrime-FT: HEK293T, PE2max with Optimized scaffold
HEK293T PE4max-e DeepPrime-FT: HEK293T, PE4max with Optimized scaffold and epegRNA
HEK293T PE4max DeepPrime-FT: HEK293T, PE4max with Optimized scaffold
A549 PE2max-e DeepPrime-FT: A549, PE2max with Optimized scaffold and epegRNA
A549 PE2max DeepPrime-FT: A549, PE2max with Optimized scaffold
A549 PE4max-e DeepPrime-FT: A549, PE4max with Optimized scaffold and epegRNA
A549 PE4max DeepPrime-FT: A549, PE4max with Optimized scaffold
DLD1 NRCH_PE4max DeepPrime-FT: DLD1, NRCH-PE4max with Optimized scaffold
DLD1 PE2max DeepPrime-FT: DLD1, PE2max with Optimized scaffold
DLD1 PE4max DeepPrime-FT: DLD1, PE4max with Optimized scaffold
HCT116 PE2 DeepPrime-FT: HCT116, PE2 with Optimized scaffold
HeLa PE2max DeepPrime-FT: HeLa, PE2max with Optimized scaffold
MDA-MB-231 PE2 DeepPrime-FT: MDA-MB-231, PE2 with Optimized scaffold
NIH3T3 NRCH_PE4max DeepPrime-FT: NIH3T3, NRCH-PE4max with Optimized scaffold

Get ClinVar record and DeepPrime score using GenET

ClinVar database contains mutations that are clinically evaluated to be pathogenic and related to human diseases(Laudrum et al. NAR 2018). GenET utilized the NCBI efect module to access ClinVar records to retrieve related variant data such as the genomic sequence, position, and mutation pattern. Using this data, GenET designs and evaluates pegRNAs that target the variant using DeepPrime.

from genet import database as db

# Accession (VCV) or variantion ID is available
cv_record = db.GetClinVar('VCV000428864.3')

print(cv_record.seq()) # default context length = 60nt

>>> output: # WT sequence, Alt sequence
('GGTCACTCACCTGGAGTGAGCCCTGCTCCCCCCTGGCTCCTTCCCAGCCTGGGCATCCTTGAGTTCCAAGGCCTCATTCAGCTCTCGGAACATCTCGAAGCGCTCACGCCCACGGATCTGC',
 'GGTCACTCACCTGGAGTGAGCCCTGCTCCCCCCTGGCTCCTTCCCAGCCTGGGCATCCTTGTTCCAAGGCCTCATTCAGCTCTCGGAACATCTCGAAGCGCTCACGCCCACGGATCTGCAG')

In addition, various information other than the sequence can be obtained from the record.

# for example, variant length of the record
print(cv_record.alt_len)

>>> output:
2

Clinvar records obtained through this process is used to design all possible pegRNAs within the genet.predict module's pecv_score function.

from genet import database as db
from genet import predict as prd

cv_record = db.GetClinVar('VCV000428864.3')
prd.pecv_score(cv_record)