DeepPrime

Predict Prime editing efficiency

DeepPrime is a prediction model for evaluating prime editing guideRNAs (pegRNAs) that target specific target sites for prime editing (Yu et al. Cell 2023). DeepSpCas9 prediction score is calculated simultaneously and requires tensorflow (version >=2.6). DeepPrime was developed on pytorch.

How to Use DeepPrime

To use DeepPrime, you need to prepare a DNA sequence containing the intended prime editing and surrounding context information as input. Intended prime editing can only involve 1-3nt substitution, insertion, or deletion, and it is not possible to introduce multiple edit types in combination. The position where prime editing is introduced is indicated in parentheses, and the original and prime-edited sequences are separated using '/'.

Some example inputs are as follows:

# Example 1: 1bp substitution (T to A)
input_seq = 'CTCACGTGAGCTCTTTGAGCTTGCCTGTCTCTGTGGGCTGAAGGCTGTTCCCTGTTTCCT(T/A)CAGCTCTACGTCTCCTCCGAGAGCCGCTTCAACACCCTGGCCGAGTTGGTTCATCATCAT'

# Example 2: 3bp insertion (CTT insertion)
input_seq = 'CTCACGTGAGCTCTTTGAGCTTGCCTGTCTCTGTGGGCTGAAGGCTGTTCCCTGTTTCCT(/CTT)TCAGCTCTACGTCTCCTCCGAGAGCCGCTTCAACACCCTGGCCGAGTTGGTTCATCATCAT'

# Example 3: 2bp deletion (TC deletion)
input_seq = 'CTCACGTGAGCTCTTTGAGCTTGCCTGTCTCTGTGGGCTGAAGGCTGTTCCCTGTTTCCT(TC/)AGCTCTACGTCTCCTCCGAGAGCCGCTTCAACACCCTGGCCGAGTTGGTTCATCATCAT'

If you have prepared the input as described above, you can use DeepPrime as follows. When you input the target sequence and editing informations into DeepPrime and run it, it designs all possible types of pegRNAs for the given sequence and automatically calculates their corresponding biofeatures. You can check the calculated biofeatures using .features.

from genet.predict import DeepPrime

input_seq = 'CTCACGTGAGCTCTTTGAGCTTGCCTGTCTCTGTGGGCTGAAGGCTGTTCCCTGTTTCCT(T/A)CAGCTCTACGTCTCCTCCGAGAGCCGCTTCAACACCCTGGCCGAGTTGGTTCATCATCAT'

pegrna = DeepPrime(input_seq)

# check designed pegRNAs
>>> pegrna.features.head()

	ID	Spacer	RT-PBS	PBS_len	RTT_len	RT-PBS_len	Edit_pos	Edit_len	RHA_len	Target	...	deltaTm_Tm4-Tm2	GC_count_PBS	GC_count_RTT	GC_count_RT-PBS	GC_contents_PBS	GC_contents_RTT	GC_contents_RT-PBS	MFE_RT-PBS-polyT	MFE_Spacer	DeepSpCas9_score
0	SampleName	AAGACAACACCCTTGCCTTG	CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG	7	35	42	34	1	1	ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...	...	-340.105	5	16	21	71.42857	45.71429	50	-10.4	-0.6	45.96754
1	SampleName	AAGACAACACCCTTGCCTTG	CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG	8	35	43	34	1	1	ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...	...	-340.105	6	16	22	75	45.71429	51.16279	-10.4	-0.6	45.96754
2	SampleName	AAGACAACACCCTTGCCTTG	CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT	9	35	44	34	1	1	ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...	...	-340.105	6	16	22	66.66667	45.71429	50	-10.4	-0.6	45.96754
3	SampleName	AAGACAACACCCTTGCCTTG	CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG	10	35	45	34	1	1	ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...	...	-340.105	7	16	23	70	45.71429	51.11111	-10.4	-0.6	45.96754
4	SampleName	AAGACAACACCCTTGCCTTG	CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT	11	35	46	34	1	1	ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...	...	-340.105	7	16	23	63.63636	45.71429	50	-10.4	-0.6	45.96754

Next, select model PE system and run DeepPrime

pe2max_output = pegrna.predict(pe_system='PE2max', cell_type='HEK293T')

>>> pe2max_output.head()

	ID	PE2max_score	Spacer	RT-PBS	PBS_len	RTT_len	RT-PBS_len	Edit_pos	Edit_len	RHA_len	Target
0	SampleName	0.904387	AAGACAACACCCTTGCCTTG	CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG	7	35	42	34	1	1	ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...
1	SampleName	2.375938	AAGACAACACCCTTGCCTTG	CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG	8	35	43	34	1	1	ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...
2	SampleName	2.61238	AAGACAACACCCTTGCCTTG	CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT	9	35	44	34	1	1	ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...
3	SampleName	3.641537	AAGACAACACCCTTGCCTTG	CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG	10	35	45	34	1	1	ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...
4	SampleName	3.768321	AAGACAACACCCTTGCCTTG	CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT	11	35	46	34	1	1	ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA...

Predicting efficiencies of existing pegRNAs

When the target, PBS, and RT template sequences are accurately inputted, DeepPrimeGuideRNA predicts the DeepPrime score of the corresponding pegRNA. For example, let's assume we have the following target and pegRNA:

prime_editing_complex

To obtain the DeepPrime score of the pegRNA above, you can execute the code as follow; similar to .predict method in DeepPrime, you can specify pe_system and cell_type.

from genet.predict import DeepPrimeGuideRNA

target    = 'TTTAAGGTTTCAGTTGACATTTGCAGGTTATAGTTCTTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCA'
pbs       = 'AATGTCAAC'
rtt       = 'AGAAACTGAGACGAACTATAACCTGCA'
edit_len  = 1
edit_pos  = 16
edit_type = 'sub'

pegrna = DeepPrimeGuideRNA('pegRNA_test', target=target, pbs=pbs, rtt=rtt, 
                           edit_len=edit_len, edit_pos=edit_pos, edit_type=edit_type)

pe2max_score = pegrna.predict('PE2max')
print(pe2max_score) # 8.23717212677002

The inputs for DeepPrimeGuideRNA are configured as follows:

Input	Type	Description
sID	str	Name of sample or pegRNA
target	str	4nt additional context sequence must be included in the 5' direction. The Protospacer (region to which the guide sequence is attached) is oriented in the 5'->3' direction and the target sequence must be 74nt in length.
pbs	str	The PBS sequence from the pegRNA. Both T (DNA) and U (RNA) forms are acceptable.
rtt	str	The RT template sequence from the pegRNA. Both T (DNA) and U (RNA) forms are acceptable.
edit_len	int	Select one of 1, 2, or 3 according to the intended prime editing.
edit_pos	int	Select one from 1-40 according to the intended prime editing.
edit_type	str	Select one from 'sub', 'ins', 'del' according to the intended prime editing.

Current available DeepPrime models:

Cell type	PE system	Model
HEK293T	PE2	DeepPrime_base
HEK293T	NRCH_PE2	DeepPrime-FT: HEK293T, NRCH-PE2 with Optimized scaffold
HEK293T	NRCH_PE2max	DeepPrime-FT: HEK293T, NRCH-PE2max with Optimized scaffold
HEK293T	PE2	DeepPrime-FT: HEK293T, PE2 with Conventional scaffold
HEK293T	PE2max-e	DeepPrime-FT: HEK293T, PE2max with Optimized scaffold and epegRNA
HEK293T	PE2max	DeepPrime-FT: HEK293T, PE2max with Optimized scaffold
HEK293T	PE4max-e	DeepPrime-FT: HEK293T, PE4max with Optimized scaffold and epegRNA
HEK293T	PE4max	DeepPrime-FT: HEK293T, PE4max with Optimized scaffold
A549	PE2max-e	DeepPrime-FT: A549, PE2max with Optimized scaffold and epegRNA
A549	PE2max	DeepPrime-FT: A549, PE2max with Optimized scaffold
A549	PE4max-e	DeepPrime-FT: A549, PE4max with Optimized scaffold and epegRNA
A549	PE4max	DeepPrime-FT: A549, PE4max with Optimized scaffold
DLD1	NRCH_PE4max	DeepPrime-FT: DLD1, NRCH-PE4max with Optimized scaffold
DLD1	PE2max	DeepPrime-FT: DLD1, PE2max with Optimized scaffold
DLD1	PE4max	DeepPrime-FT: DLD1, PE4max with Optimized scaffold
HCT116	PE2	DeepPrime-FT: HCT116, PE2 with Optimized scaffold
HeLa	PE2max	DeepPrime-FT: HeLa, PE2max with Optimized scaffold
MDA-MB-231	PE2	DeepPrime-FT: MDA-MB-231, PE2 with Optimized scaffold
NIH3T3	NRCH_PE4max	DeepPrime-FT: NIH3T3, NRCH-PE4max with Optimized scaffold

Get ClinVar record and DeepPrime score using GenET

ClinVar database contains mutations that are clinically evaluated to be pathogenic and related to human diseases(Laudrum et al. NAR 2018). GenET utilized the NCBI efect module to access ClinVar records to retrieve related variant data such as the genomic sequence, position, and mutation pattern. Using this data, GenET designs and evaluates pegRNAs that target the variant using DeepPrime.

from genet import database as db

# Accession (VCV) or variantion ID is available
cv_record = db.GetClinVar('VCV000428864.3')

print(cv_record.seq()) # default context length = 60nt

>>> output: # WT sequence, Alt sequence
('GGTCACTCACCTGGAGTGAGCCCTGCTCCCCCCTGGCTCCTTCCCAGCCTGGGCATCCTTGAGTTCCAAGGCCTCATTCAGCTCTCGGAACATCTCGAAGCGCTCACGCCCACGGATCTGC',
 'GGTCACTCACCTGGAGTGAGCCCTGCTCCCCCCTGGCTCCTTCCCAGCCTGGGCATCCTTGTTCCAAGGCCTCATTCAGCTCTCGGAACATCTCGAAGCGCTCACGCCCACGGATCTGCAG')

In addition, various information other than the sequence can be obtained from the record.

# for example, variant length of the record
print(cv_record.alt_len)

>>> output:
2

Clinvar records obtained through this process is used to design all possible pegRNAs within the genet.predict module's pecv_score function.

from genet import database as db
from genet import predict as prd

cv_record = db.GetClinVar('VCV000428864.3')
prd.pecv_score(cv_record)