Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NUCLEASES COMPRISING CELL PENETRATING PEPTIDE SEQUENCES
Document Type and Number:
WIPO Patent Application WO/2022/109058
Kind Code:
A1
Abstract:
The present disclosure provides systems, methods, and compositions for the delivery of one or more components of a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas (CRISPR associated protein) gene editing system.

Inventors:
SETHURAMAN NATARAJAN (US)
Application Number:
PCT/US2021/059773
Publication Date:
May 27, 2022
Filing Date:
November 17, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ENTRADA THERAPEUTICS INC (US)
International Classes:
A61K38/03; C07K4/00; C07K14/315; C12N9/22
Domestic Patent References:
WO2016205613A12016-12-22
Other References:
RAMAKRISHNA ET AL.: "Gene disruption by cell -penetrating peptide-mediated delivery of Cas9 protein and guide RNA", GENOME RESEARCH, vol. 24, no. 6, 2 April 2014 (2014-04-02), pages 1020 - 1027, XP055692365, DOI: 10.1101/gr.171264.113
CHEN ET AL.: "Engineering Cell -Permeable Proteins through Insertion of Cell -Penetrating Motifs into Surface Loops", ACS CHEM. BIOL., vol. 15, 3 August 2020 (2020-08-03), pages 2568 - 2576, XP055837925, DOI: 10.1021/acschembio.0c00593
Attorney, Agent or Firm:
CAMPBELL, Keith M. (US)
Download PDF:
Claims:
Claims

1. A construct comprising at least one component of a CRISPR-Cas gene editing system and at least one cell penetrating peptide (CPP) sequence, wherein:

(a) the component of a CRISPR-Cas gene editing system comprises a nuclease comprising one or more one loop regions, and at least one loop region comprises a CPP sequence inserted into the loop region;

(b) the component of a CRISPR-Cas gene editing system comprises a nuclease to which at least one CPP sequence is conjugated;

(c) the component of a CRISPR-Cas gene editing system comprises a gRNA to which at least one CPP sequence is conjugated; or

(d) a combination of any of (a), (b) or (c).

2. The construct of claim 1, wherein the nuclease comprises one or more loop regions, and the at least one loop region comprises a CPP sequence inserted into the loop region.

3. The construct of claim 1, wherein the at least one CPP sequence is conjugated to the nuclease.

4. The construct of claim 1, wherein:

(a) the nuclease comprises one or more one loop regions, and at least one loop region comprises a CPP sequence inserted into the loop region; and

(b) at least one CPP sequence is conjugated to the nuclease.

5. The construct of any one of claims 1-3, wherein the looped nuclease comprises a zinc- finger nuclease, meganuclease, transcription activator-like effector nuclease (TALEN), RNA nuclease, DNA nuclease, or CRISPR/Cas nuclease.

6. The construct of claim 4, wherein the CRISPR/Cas nuclease is Cas9, Casl2a (Cpf 1), Casl2b, Casl2c, Tnp-B like, Casl3a (C2c2), Casl3b, Casl4, or a variant or fragment thereof.

7. The construct of any one of claims 1-6, wherein the nuclease is Cas9 or a Cas9 variant.

87

SUBSTITUTE SHEET (RULE 26)

8. The construct of any one of claims 1 or 3-7, wherein the at least one CPP sequence is conjugated to an N-terminus of the nuclease, to a C-terminus of the nuclease, to a side chain of an amino acid residue of the nuclease, or a combination thereof.

9. The construct of any one of claims 1 or 3-8, wherein the at least one CPP sequence is conjugated to the N-terminus of the nuclease, to the C-terminus of the nuclease, or a combination thereof.

10. The construct of any one of claims 1 or 3-8, wherein the at least one CPP sequence is conjugated to a side chain of an amino acid residue of the nuclease.

11. The construct of any one of claims 1 or 3-8, wherein the side chain is the side chain of a residue of lysine, glutamine, glutamic acid, asparagine, or aspartic acid.

12. The construct of claim 11, wherein the side chain is the side chain of a residue of lysine.

13. The construct of any one of claims 1-12, wherein the component of the CRISPR-Cas gene editing system comprises a guide RNA sequence.

14. The construct of claim 13, wherein one or more CPP sequence is conjugated to a 5’ end of the guide RNA sequence, a 3’ end of the guide RNA sequence, or on a backbone of the guide RNA sequence.

15. The construct of any one of claim 1-12, comprising 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 CPPs.

16. The construct of any one of claims 3-15, wherein the at least one CPP sequence conjugated to the nuclease, the guide RNA sequence, or a combination thereof is a cyclic CPP sequence.

88

SUBSTITUTE SHEET (RULE 26)

17. The construct of claim 16, comprising a linker, wherein the at least one CPP sequence conjugated to the nuclease, the guide RNA sequence, or a combination thereof is a cyclic CPP sequence conjugated through the linker.

18. The construct of claim 17, wherein the linker is a bivalent or trivalent C1-C50 saturated or unsaturated, straight or branched alkyl, wherein 1-25 methylene groups are optionally and independently replaced by -N(H)-, -N(CI-C4 alkyl)-, -N(cycloalkyl)-, -O-, -C(O)-, -C(O)O-, - S-, -S(O)-, -S(O)2-, -S(O)2N(CI-C4 alkyl)-, -S(O)2N(cycloalkyl)-, -N(H)C(O)-, -N(CI-C4 alkyl)C(O)-, -N(cycloalkyl)C(O)-, -C(O)N(H)-, -C(O)N(CI-C4 alkyl), -C(O)N(cycloalkyl), aryl, heteroaryl, cycloalkyl, or cycloalkenyl.

19. The construct of any one of claims 1-18, wherein the CPP comprises at least two arginine residues.

20. The construct of any one of claims 1-19, wherein the CPP comprises from two to six arginine residues.

21. The construct of any one of claims 1-20, wherein the CPP comprises at least one amino acid residue that comprises a hydrophobic side chain.

22. The construct of any one of claims 1-21, wherein the CPP comprises from one to six amino acid residues which independently comprise a hydrophobic side chain.

23. The construct of claim 22, wherein the amino acid residues comprising a hydrophobic side chain are residues of glycine, alanine, valine, leucine, isoleucine, methionine, phenylalanine, tryptophan, proline, naphthylalanine, phenylglycine, homophenylalanine, tyrosine, cyclohexylalanine, piperidine-2-carboxylic acid, cyclohexylalanine, norleucine, 3- (3-benzothienyl)-alanine, 3-(2-quinolyl)-alanine, O-benzylserine, 3-(4-(benzyloxy)phenyl)- alanine, S-(4-methylbenzyl)cysteine, 7V-(naphthalen-2-yl)glutamine, 3-(l, T-biphenyl-4-yl)- alanine, tert-leucine, or nicotinoyl lysine, each of which is optionally substituted with one or more substituents.

89

SUBSTITUTE SHEET (RULE 26)

24. The construct of any one of claims 21-23, wherein at least one of the amino acid residues comprising a hydrophobic side chain is a residue of tryptophan or phenylalanine.

25. The construct of any one of claims 21-24, wherein at least one of the amino acid residues comprising a hydrophobic side chain is a tryptophan residue.

26. The construct of any one of claims 21-24, wherein at least one of the amino acid residues comprises a hydrophobic side chain is a phenylalanine residue.

27. The construct of any one of claims 21-24, wherein each of the at least one of the amino acids comprising a hydrophobic side chain is tryptophan.

28. The construct of any one of claims 1-25, wherein the CPP sequence comprises at least three arginine residues and at least three tryptophan residues.

29. The construct of any one of claims 1-28, wherein the CPP sequence in at least one loop region of the nuclease comprises at least three arginine residues and at least three tryptophan residues.

30. The construct of any one of claims 1-28, wherein the nuclease comprises a first looped region and a second looped region, wherein a first CPP sequence is inserted into the first looped region, and a second CPP sequence is inserted into the second looped region.

31. The construct of claim 30, wherein the first CPP comprises at least three arginine residues, and the second CPP comprises at least three amino acid residues each of which independently comprises with a hydrophobic side chain.

32. The construct of any one of claims 1-31, wherein the CPP sequence comprises from one to six residues of a D-amino acids.

33. The construct of claim 32, wherein the one or more D-amino acid residues are arginine.

90

SUBSTITUTE SHEET (RULE 26)

34. The construct of claim 32, wherein the one or more D-amino acid residues are residues of amino acids comprising a hydrophobic side chain.

35. The construct of claim 34, wherein the one or more of the residues of amino acids comprising a hydrophobic side chain is a residue of phenylalanine.

36. The construct of claim 32, wherein the one or more of the residues of amino acids comprising a hydrophobic side chain is a residue of napthylalanine.

37. The construct of any one of claims 1-36, further comprising an exocyclic peptide (EP), wherein the EP is conjugated to the nuclease, guide RNA sequence, or combination thereof.

38. The construct of claim 37, wherein the exocyclic peptide (EP) conjugated to the linker that conjugates the CPP to the nuclease, guide RNA sequence, or combination thereof.

39. The construct of claim 37 or 38, wherein the EP comprises from 2 to 10 amino acid residues.

40. The construct of claim 39, wherein the EP comprises from 4 to 8 amino acid residues.

41. The construct of any one of claims 38-40 wherein the EP comprises 1 or 2 arginine residues.

42. The construct of any one of claims 38-41, wherein the EP comprises 1, 2, 3, or 4 lysine residues.

43. The construct of claim 42, wherein the amino group on the side chain of each lysine residue is substituted with a trifluoroacetyl (-COCF3) group, allyloxycarbonyl (Alloc), l-(4,4- dimethyl-2,6-dioxocyclohexylidene)ethyl (Dde), or (4,4-dimethyl-2,6-dioxocyclohex-l- ylidene-3)-methylbutyl (ivDde) group.

91

SUBSTITUTE SHEET (RULE 26)

44. The construct of any of claims 38-43, wherein the EP comprises at least 2 amino acid residues with a hydrophobic side chain.

45. The construct of claim 44, wherein the amino acid residue with a hydrophobic side chain is selected from valine, proline, alanine, leucine, isoleucine, and methionine.

46. The construct of any one of claims 38-45, wherein the exocyclic peptide comprises one of the following sequences: PKKKRKV; KR; RR; KKK; KGK; KBK; KBR; KRK; KRR; RKK; RRR; KKKK; KKRK; KRKK; KRRK; RKKR; RRRR; KGKK; KKGK; KKKKK; KKKRK; KBKBK; KKKRKV; PGKKRKV; PKGKRKV; PKKGRKV;

PKKKGKV; PKKKRGV; or PKKKRKG.

47. The construct of any one of claims 38-46, wherein the exocyclic peptide has the structure: Ac-P-K-K-K-R-K-V-.

48. The construct of any one of claims 1-47, wherein each CPP sequence independently comprises a sequence from Table D.

49. The construct of any one of the preceding claims, comprising a detectable tag.

50. The construct of claim 49, wherein the detectable tag is selected from a FLAG tag, a polyhistidine tag, a SNAP tag, a Halo tag, cMyc, glutathione-S-transferase, avidin, an enzyme, a fluorescent protein, a luminescent protein, a chemiluminescent protein, a bioluminescent protein, and a phosphorescent protein.

51. A recombinant nucleic acid molecule encoding the construct of any one of claims 1- 50.

52. An expression cassette comprising the recombinant nucleic acid of claim 51 operably linked to a promoter.

53. A vector comprising the expression cassette of claim 52.

92

SUBSTITUTE SHEET (RULE 26)

54. A host cell comprising the vector of claim 53.

55. The host cell of claim 54, wherein the host cell is selected from a Chinese Hamster Ovary (CHO) cell, an HEK 293 cell, a BHK cell, a murine NSO cell, a murine SP2/0 cell, or an E. coli cell.

56. A composition comprising a construct of any one of claims 1-50.

57. A method of producing the construct any one of claims 1-50, comprising culturing the host cell of claim 55 and purifying the expressed modified looped nuclease from the supernatant.

58. A method of treating a disease or condition in a patient in need thereof, comprising administering a construct of any one of claims 1-50 to the patient.

59. A method of gene editing, comprising administering a construct of any one of claims 1-50 to a cell.

60. The method of claim 59, comprising upregulating target DNA.

61. The method of claim 59, comprising upregulating target RNA.

62. The method of claim 60, comprising downregulating target DNA.

63. The method of claim 60, comprising downregulating target RNA.

93

SUBSTITUTE SHEET (RULE 26)

Description:
NUCLEASES COMPRISING CELL PENETRATING PEPTIDE SEQUENCES

Description Of The Text File Submitted Electronically

[0001] The contents of the text file submitted electronically herewith are incorporated herein by reference in their entirety: A computer readable format copy of the Sequence Listing (filename: CYPT_031_01WO_SeqList_ST25.txt, date recorded: November 17, 2021, file size -125 kilobytes).

Background

[0002] Clustered regularly interspaced short palindromic repeats (CRISPR)-Cas (CRISPR associated proteins) is a prokaryotic RNA-guided adaptive immune system that was identified in archaea and bacteria and has been adapted for gene editing.

[0003] Discovery of the CRISPR-Cas systems has revolutionized modem molecular biology. The system is highly specific in terms of recognized sequence and this specificity can be easily altered by modifying the sequence coding the guide RNA. The range of applications for CRISPR-Cas systems can be further expanded by modifying the Cas proteins themselves.

[0004] CRISPR-Cas gene editing systems generally include two components: a CRISPR associated (Cas) nuclease and a guide RNA (gRNA). The gRNA can be programmed to recognize a nucleic acid sequence via a “spacer” sequence of about 18 to about 22 nucleotides at the 5’ end of the gRNA. The gRNA forms a ribonuclease complex with the Cas nuclease. Upon encountering a complementary nucleotide sequence, the spacer region of the gRNA forms a Watson-Crick base-pair with the target nucleic acid sequence enabling the Cas nuclease to precisely cleave the nucleic acid at the target sequence. Many types of CRISPR- Cas gene systems have been identified and can be classified into the three major types (I, II, and III) plus a less common but clearly distinct type IV. Some CRISPR-Cas systems target DNA, others target RNA, others can target both DNA and RNA.

[0005] Effective delivery of the components of a CRISPR-Cas gene editing system into the cytosol and nucleus of mammalian cells would open the door to a wide range of applications including treatment of many currently intractable diseases. However, effective delivery in a clinical setting is yet to be accomplished and has been hampered by lack of cell permeability. Many attempts have been made to improve cell permeability, including protein surface engineering, incorporation into nanoparticle carriers, and attachment of cell-penetrating

1

SUBSTITUTE SHEET (RULE 26) peptides. However, these approaches generally have poor cytosolic delivery efficiency, with most cargo entrapped inside the endosomal/lysosomal compartments. Therefore, additional strategies for enhancing the cell-permeability of the components of a CRISPR-Cas gene editing system for a variety of therapeutic and research purposes are needed.

Summary

[0006] In embodiments, the present disclosure provides a construct comprising at least one component of a CRISPR-Cas gene editing system and at least one cell penetrating peptide (CPP) sequence, wherein: (a) the component of a CRISPR-Cas gene editing system comprises a nuclease comprising one or more one loop regions, and at least one loop region comprises a CPP sequence inserted into the loop region; (b) the component of a CRISPR-Cas gene editing system comprises a nuclease to which at least one CPP sequence is conjugated; (c) the component of a CRISPR-Cas gene editing system comprises a gRNA to which at least one CPP sequence is conjugated; or (d) a combination of any of (a), (b) or (c).

[0007] In embodiments, the nuclease comprises one or more loop regions, and the at least one loop region comprises a CPP sequence inserted into the loop region. In some embodiments, the at least one CPP sequence is conjugated to the nuclease.

[0008] In embodiments, (a) the nuclease comprises one or more one loop regions, and at least one loop region comprises a CPP sequence inserted into the loop region; and (b) at least one CPP sequence is conjugated to the nuclease.

[0009] In embodiments, the looped nuclease is zinc-finger nuclease, meganuclease, transcription activator-like effector nuclease (TALEN), RNA nuclease, DNA nuclease, or CRISPR/Cas nuclease. In embodiments, the CRISPR/Cas nuclease is Cas9, Cas9 variant, Casl2a (Cpfl), Casl2b, Casl2c, Tnp-B like, Casl3a (C2c2), Casl3b, or Casl4. In embodiments, the nuclease is Cas9 or a Cas9 variant.

[0010] In embodiments, the at least one CPP sequence is conjugated to an N-terminus of the nuclease, to a C-terminus of the nuclease, to a side chain of an amino acid residue of the nuclease, or a combination thereof. In embodiments, the at least one CPP sequence is conjugated to the N-terminus of the nuclease, to the C-terminus of the nuclease, or a combination thereof.

[0011] In embodiments, the at least one CPP sequence is conjugated to a side chain of an amino acid residue of the nuclease. In embodiments, the side chain is the side chain of a

2

SUBSTITUTE SHEET (RULE 26) residue of lysine, glutamine, glutamic acid, asparagine, or aspartic acid. In embodiments, the side chain is the side chain of a residue of lysine.

[0012] In embodiments, the component of the CRISPR-Cas gene editing system comprises a guide RNA sequence. In embodiments, one or more CPP sequence is conjugated to a 5’ end of the guide RNA sequence, a 3’ end of the guide RNA sequence, or on a backbone of the guide RNA sequence. In embodiments, the construct comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 CPPs.

[0013] In embodiments, the at least one CPP sequence conjugated to the nuclease, the guide RNA sequence, or a combination thereof is a cyclic CPP sequence.

[0014] In embodiments, the construct comprises a linker, wherein the at least one CPP sequence conjugated to the nuclease, the guide RNA sequence, or a combination thereof is a cyclic CPP sequence conjugated through the linker. In embodiments, the linker is a bivalent or trivalent C1-C50 saturated or unsaturated, straight or branched alkyl, wherein 1-25 methylene groups are optionally and independently replaced by -N(H)-, -N(C1-C4 alkyl)-, -N(cycloalkyl)- , -O-, -C(O)-, -C(O)O-, -S-, -S(O)-, -S(O)2-, -S(O)2N(C1-C4 alkyl)-, -S(O)2N(cycloalkyl)-, - N(H)C(O)-, -N(C1-C4 alkyl)C(O)-, -N(cycloalkyl)C(O)-, -C(O)N(H)-, -C(O)N(CI-C 4 alkyl), - C(O)N(cycloalkyl), aryl, heteroaryl, cycloalkyl, or cycloalkenyl.

[0015] In embodiments, the CPP comprises at least two arginine residues. In embodiments, the CPP comprises from two to six arginine residues.

[0016] In embodiments, the CPP comprises at least one amino acid residue that comprises a hydrophobic side chain. In embodiments, the CPP comprises from one to six amino acid residues which independently comprise a hydrophobic side chain. In embodiments, the amino acid residues comprising a hydrophobic side chain are residues of glycine, alanine, valine, leucine, isoleucine, methionine, phenylalanine, tryptophan, proline, naphthylalanine, phenylglycine, homophenylalanine, tyrosine, cyclohexylalanine, piperidine-2-carboxylic acid, cyclohexylalanine, norleucine, 3-(3-benzothienyl)-alanine, 3-(2-quinolyl)-alanine, O- benzylserine, 3-(4-(benzyloxy)phenyl)-alanine, S-(4-methylbenzyl)cysteine, N-(naphthalen-2- yl)glutamine, 3-(l,l'-biphenyl-4-yl)-alanine, tert-leucine, or nicotinoyl lysine, each of which is optionally substituted with one or more substituents. In embodiments, at least one of the amino acid residues comprising a hydrophobic side chain is a residue of tryptophan or phenylalanine. In embodiments, at least one of the amino acid residues comprising a hydrophobic side chain is a tryptophan residue. In embodiments, at least one of the amino acid residues comprising a 3

SUBSTITUTE SHEET (RULE 26) hydrophobic side chain is a phenylalanine residue. In embodiments, each of the at least one of the amino acids comprising a hydrophobic side chain is tryptophan.

[0017] In embodiments, the CPP sequence comprises at least three arginine residues and at least three tryptophan residues. In embodiments, the CPP sequence in at least one loop region of the nuclease comprises at least three arginine residues and at least three tryptophan residues.

[0018] In embodiments, the nuclease comprises a first looped region and a second looped region, wherein a first CPP sequence is inserted into the first looped region, and a second CPP sequence is inserted into the second looped region. In embodiments, the first CPP comprises at least three arginine residues, and the second CPP comprises at least three amino acid residues each of which independently comprises a hydrophobic side chain.

[0019] In embodiments, the CPP sequence comprises from one to six residues of a D- amino acids. In embodiments, the one or more D-amino acid residues are arginine. In embodiments, the one or more D-amino acid residues are residues of amino acids comprising a hydrophobic side chain. In embodiments, the one or more of the residues of amino acids comprising a hydrophobic side chain is a residue of phenylalanine. In embodiments, the one or more of the residues of amino acids comprising a hydrophobic side chain is a residue of naphthylalanine.

[0020] In embodiments, the construct comprises an exocyclic peptide (EP), wherein the EP is conjugated to the nuclease, guide RNA sequence, or combination thereof. In embodiments, the exocyclic peptide (EP) conjugated to a linker that conjugates the CPP to the nuclease, guide RNA sequence, or combination thereof. In embodiments, the EP comprises from 2 to 10 amino acid residues. In embodiments, the EP comprises from 4 to 8 amino acid residues.

[0021] In embodiments, the EP comprises 1 or 2 arginine residues. In embodiments, the EP comprises 1, 2, 3, or 4 lysine residues. In embodiments, the amino group on the side chain of each lysine residue is substituted with a trifluoroacetyl (-COCF3) group, allyloxy carbonyl (Alloc), l-(4,4-dimethyl-2,6-dioxocyclohexylidene)ethyl (Dde), or (4,4- dimethyl-2,6-dioxocyclohex-l-ylidene-3)-methylbutyl (ivDde) group. In embodiments, the EP comprises at least 2 amino acid residues with a hydrophobic side chain. In embodiments, the amino acid residue with a hydrophobic side chain is valine, proline, alanine, leucine, isoleucine, or methionine.

4

SUBSTITUTE SHEET (RULE 26) [0022] In embodiments, the exocyclic peptide comprises one of the following sequences: PKKKRKV; KR; RR; KKK; KGK; KBK; KBR; KRK; KRR; RKK; RRR; KKKK; KKRK; KRKK; KRRK; RKKR; RRRR; KGKK; KKGK; KKKKK; KKKRK; KBKBK; KKKRKV; PGKKRKV; PKGKRKV; PKKGRKV; PKKKGKV; PKKKRGV; or PKKKRKG. In embodiments, the exocyclic peptide has the structure: Ac-P-K-K-K-R-K-V-.

[0023] In embodiments, each CPP sequence independently comprises a sequence from Table D.

[0024] In embodiments, the construct comprises a detectable tag. In embodiments, the detectable tag is a FLAG tag, a polyhistidine tag, a SNAP tag, a Halo tag, cMyc, glutathione- S-transferase, avidin, an enzyme, a fluorescent protein, a luminescent protein, a chemiluminescent protein, a bioluminescent protein, or a phosphorescent protein.

[0025] In embodiments, the present disclosure provides a recombinant nucleic acid molecule encoding a construct as disclosed herein. In embodiments, the construct is operably linked to a promoter.

[0026] In embodiments, the present disclosure provides a vector comprising an expression cassette encoding one or more components of a CRISPR-Cas gene editing system.

[0027] In embodiments, the present disclosure provides a host cell comprising a vector disclosed herein. In embodiments, the host cell is a Chinese Hamster Ovary (CHO) cell, an HEK 293 cell, a BHK cell, a murine NSO cell, a murine SP2/0 cell, or an E. coli cell.

[0028] In embodiments, the present disclosure provides a composition comprising a construct as disclosed herein.

[0029] In embodiments, the present disclosure provides a method of producing a construct as disclosed herein, comprising culturing a host cell disclosed herein and purifying an expressed modified looped nuclease from the supernatant.

[0030] In embodiments, the present disclosure provides a method of treating a disease or condition, comprising administering a construct as disclosed herein.

[0031] In embodiments, the present disclosure provides a method of gene editing, comprising administering a construct as disclosed herein. In embodiments, the method comprises upregulating target DNA. In embodiments, the method comprises upregulating

5

SUBSTITUTE SHEET (RULE 26) expression of a target RNA. In embodiments, the method comprises downregulating target DNA. In embodiments, the method comprises downregulating expression of a target RNA.

Brief Description Of The Drawings

[0032] FIG. 1 shows the secondary structure of Cas9 from Streptococcus pyogenes serotype Ml (SEQ ID NO: 1). Beta strands are italicized and bold. Loops are double underlined. Helices are underlined with a squiggly line.

Detailed Description

[0033] Disclosed herein are systems, methods, or compositions for the delivery and therapeutic applications of one or more components of a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas (CRISPR associated protein) gene editing system. In embodiments, a system, method or composition is provided that facilitates the delivery of a Cas nuclease, a guide RNA (gRNA), or a combination thereof.

[0034] In embodiments, the disclosure provides a construct comprising a component of a CRISPR-Cas gene editing system and a peptide sequence which allows for the component of the CRISPR-Cas gene editing system to penetrate the cell membrane and deliver the component of the CRISPR-Cas gene editing system intracellularly. In embodiments, the disclosure provides nucleases comprising an exogenous peptide sequence which allows for the nuclease to penetrate the cell membrane and deliver the nuclease intracellularly. In embodiments, the disclosure provides a construct comprising a nuclease and at least one cell penetrating peptide (CPP) sequence, wherein: (a) the nuclease comprises one or more one loop regions, and at least one loop region comprises a CPP sequence inserted into the loop region;

(b) at least one CPP sequence is conjugated to the nuclease; or (c) a combination of (a) and (b). In embodiments, the disclosure provides a construct that comprises a guide RNA (gRNA) and at least one peptide sequence, for example, at least one cell penetrating peptide (CPP) which allows the gRNA to penetrate the cell membrane and deliver the gRNA intracellularly. In embodiments, the disclosure provides a composition comprising (1) a construct comprising a nuclease and at least one cell penetrating peptide (CPP) sequence, wherein: (a) the nuclease comprises one or more one loop regions, and at least one loop region comprises a CPP sequence inserted into the loop region; (b) at least one CPP sequence is conjugated to the nuclease; or

(c) a combination of (a) and (b); and (2) a construct that comprises a guide RNA (gRNA) and

6

SUBSTITUTE SHEET (RULE 26) at least one peptide sequence, for example, at least one cell penetrating peptide (CPP); or (3) a combination of (1) and (2).

[0035] In embodiments, the disclosure provides a construct comprises at least one expression vector that encodes at least one component of a CRSIRP-Cas gene editing system and at least one peptide sequence which allows for the expression vector to penetrate the cell membrane and deliver the expression vector intracellularly. In embodiments, the expression vector encodes at least one CRISPR-associated nuclease. In embodiments, the expression vector encodes at least one gRNA. In embodiments, the expression vector encodes at least one CRISPR-associated nuclease and at least one gRNA.

[0036] In embodiments, the present disclosure provides polynucleotides encoding the constructs described herein and methods for the production of the constructs described herein.

[0037] The compositions and methods for insertion of CPP motifs (also referred to herein as “CPP sequences”) into the loops of nucleases or conjugating CPP to nucleases and/or guide sequences, as described herein, represents a general approach to endowing cell permeability to a component of a CRISPR-Cas gene editing system that would otherwise be cell-impermeable. This approach offers a number of advantages over previous methods, not the least of which is its simplicity. Additionally or alternatively, conjugation a CPP to a nuclease and/or guide RRNA sequence can further improve cell delivery efficiency of the disclosed constructs. Compared to other protein surface remodeling methods such as supercharging (Cronican et al., (2010) Potent Delivery of Functional Proteins into Mammalian Cells in Vitro and in Vivo Using a Supercharged Protein. ACS Chem. Biol. 5, 747-752; and Fuchs et al., (2007) Arginine Grafting to Endow Cell-Permeability. ACS Chem Biol. 2, 167- 170) and esterification (Mix et al., (2017) Cytosolic Delivery of Proteins by Bioreversible Esterification. J. Am. Chem. Soc. 139, 14396-14398), the methods described herein involve relatively minor changes to the structure of the CRISPR-Cas component and should be applicable to a broad range of nucleases and gRNAs. Without being bound by theory, the modified nucleases described herein are expected to be less immunogenic than other nucleases modified by other protein surface remodeling methods. Additionally, the CPP motifs grafted to protein loops are structurally constrained and relatively stable against proteolytic degradation.

[0038] General methods in molecular and cellular biochemistry for producing recombinant proteins (nucleases) can be found in such standard textbooks as Molecular

7

SUBSTITUTE SHEET (RULE 26) Cloning: A Laboratory Manual, 3rd Ed. (Sambrook et al., HaRBor Laboratory Press 2001 ); Short Protocols in Molecular Biology, 4th Ed. (Ausubel et al. eds., John Wiley & Sons 1999); Protein Methods (Bollag et al., John Wiley & Sons 1996); Nonviral Vectors for Gene Therapy (Wagner et al. eds., Academic Press 1999); Viral Vectors (Kaplift & Loewy eds., Academic Press 1995); Immunology Methods Manual (I. Lefkovits ed., Academic Press 1997); and Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, John Wiley & Sons 1998), the disclosures of which are incorporated herein by reference. General methods for conjugating proteins and oligonucleotides are described in U.S. Pat. No. 10,626,147 and International Patent Application Pub. No. PCT/US2020/066459, the disclosures of which are incorporated herein by reference.

Definitions

[0039] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the art to which the invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of particular embodiments, preferred embodiments of compositions, methods and materials are described herein. For the purposes of the present disclosure, the following terms are defined below. Additional definitions are set forth throughout this disclosure.

[0040] The articles “a,” “an,” and “the” are used herein to refer to one or to more than one (i.e., to at least one, or to one or more) of the grammatical object of the article. By way of example, “an element” means one element or one or more elements.

[0041] As used herein, the term “about” or “approximately” refers to a quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length that varies by acceptable levels in the art. In embodiments, the amount of variation may be as much as 15%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2% or 1% to a reference quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length. In one embodiment, the term “about” or “approximately” refers a range of quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length ± 15%, ± 10%, ± 9%, ± 8%, ± 7%, ± 6%, ± 5%, ± 4%, ± 3%, ± 2%, or ± 1% about a reference quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length.

8

SUBSTITUTE SHEET (RULE 26) [0042] A numerical range, e.g., 1 to 5, about 1 to 5, or about 1 to about 5, refers to each numerical value encompassed by the range. For example, in one non-limiting and merely illustrative embodiment, the range “1 to 5” is equivalent to the expression 1, 2, 3, 4, 5; or 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, or 5.0; or 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1,

2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.1, 4.2,

4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, or 5.0.

[0043] As used herein, the term “substantially” refers to a quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length that is 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or higher compared to a reference quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length. In one embodiment, “substantially the same” refers to a quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length that produces an effect, e.g., a physiological effect, that is approximately the same as a reference quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length.

[0044] As used herein, “nuclease” and “endonuclease” can be used interchangeably to refer to an enzyme capable of cleaving the phosphodiester bonds between the nucleotide subunits of a nucleic acid. In embodiments, the nuclease is capable of cleaving DNA. In embodiments, the nuclease is capable of cleaving RNA. In embodiments, the nuclease is capable of cleaving both DNA and RNA.

[0045] The term "CRISPR-associated protein" refers to an RNA-guided endonuclease component of a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) gene editing system and includes wild-type proteins as well as homologs, variants, fragments and derivatives thereof that exhibit one or more desired biological properties or functions, including, but not limited to, the ability to be targeted by a guide RNA (gRNA) to a target nucleic acid sequence (e.g., DNA or RNA sequence) in a sequence- specific manner. In embodiments, a functional homolog, variant, fragment or derivative is capable of (1) specifically interacting with a target nucleic acid sequence, for example by binding to and, optionally, cleaving (by endonuclease or nickase activity) the target nucleic acid sequence, (2) associating with a guide RNA, (3) recognizing a protospacer adjacent motif (PAM) that is juxtaposed to a target DNA or RNA sequence, or (4) combinations thereof. CRISPR- associated proteins include, but are not limited to, Cas9, Cpfl (Cast 2), C2cl, C2c3, C2c2, Casl3, CasX and CasY. The term “CRISPR-associated protein” includes all post-translationally

9

SUBSTITUTE SHEET (RULE 26) modified forms thereof, including, but not limited to glycosylation, phosphorylation, ubiquitinylation, S-nitrosylation, methylation, N- acetylation, lipidation, disulfide bond formation, sulfation, acylation, deamination etc. In embodiments, variants have a sequence that is at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, about 96%, about 97%, about 98%, about 99%, or 100% identical to an amino acid sequence of a naturally occurring (e.g., wild-type) CRISPR- associated protein. In embodiments, fragment have an amino acid sequence of at least 50, at least 75, at least 100, at least 125, at least 150, at least 175, at least 200 or at least 250 contiguous amino acid residues of a naturally occurring (e.g., wild-type) CRISPR- associated protein.

[0046] The term "guide RNA" or “gRNA” refers to the RNA sequence used to target the CRISPR-Cas gene editing system to a nucleic acid of interest. In embodiments, the gRNA is a chimeric RNA molecule which includes a CRISPR RNA (crRNA) and trans-encoded CRISPR RNA (tracrRNA) component. In embodiments, the crRNA includes from about 19 to about 22 consecutive nucleotides that are at least about 80%, about 85%, about 90%, about 95% or about 100% complementary to a target nucleic acid sequence. Techniques for designing gRNAs are known, see, for example, Doench et al. (2014) Nature biotechnology. 32(12): 1262- 7 and Graham et al. (2015) Genome Biol. 6: 260, the disclosures of which are incorporated by reference herein.

[0047] As used herein, “CRISPR-Cas gene-editing system” refers to protein, for example, a Cas protein, a nucleic acid, for example, a guide RNA (gRNA), or a combination thereof, which may be used to edit a genome. The following patent documents describe CRISPR-Cas gene-editing systems: U.S. Pat. No. 8,697,359, U.S. Pat. No. 8,771,945, U.S. Pat. No. 8,795,965, U.S. Pat. No. 8,865,406, U.S. Pat. No. 8,871,445, U.S. Pat. No. 8,889,356, U.S. Pat. No. 8,895,308, U.S. Pat. No. 8,906,616, U.S. Pat. No. 8,932,814, U.S. Pat. No. 8,945,839, U.S. Pat. No. 8,993,233, U.S. Pat. No. 8,999,641, U.S. Pat. App. No. 14/704,551, and U.S. Pat. App. No. 13/842,859, each of which are incorporated by reference herein in its entirety.

[0048] The term “CRISPR-Cas ribonucleoprotein complex” or “ribonucleoprotein complex” (RNP) refers to a complex that includes a nuclease and targeting gRNA. In embodiments, the nuclease is a Cas protein.

[0049] The term “component of a CRISPR-Cas gene editing system” refers to a Cas endonuclease, a gRNA, a CRISPR-Cas ribonucleoprotein complex” or combinations thereof.

10

SUBSTITUTE SHEET (RULE 26) [0050] The term “modified looped nuclease” refer to a nuclease in which a CPP sequence described herein is inserted into a looped region of the nuclease.

[0051] The terms “peptide”, “polypeptide”, and “protein” are used interchangeably herein, and refer to a polymeric form of amino acids of any length, which can include coded and non-coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones. The term “modified” refers to a substance or compound (e.g., a cell, a polynucleotide sequence, and/or a polypeptide sequence) that has been altered or changed as compared to the corresponding unmodified substance or compound. In embodiments, two or more amino acid residues are linked by the carboxyl group of one amino acid to the alpha amino group, thereby forming a peptide bond. In embodiments, the polypeptide comprises a peptide backbone modification in which two or more amino acids are covalently attached by a bond other than a peptide bond. In embodiments, the polypeptide comprises one or more non-natural amino acids, amino acid analogs, or other synthetic molecules that are capable of integrating into a polypeptide. The term polypeptide comprises naturally occurring and artificially occurring amino acids. There is no upper limit to the number of amino acids that can be included in a polypeptide.

[0052] A residue of an amino acid, as used herein, refers to a derivative of the amino acid that is present in a particular product (e.g., peptide). To form the product, at least one atom of the amino acid is replaced by a bond to another moiety, such that the product contains a residue of the amino acid. For example, the CPPs described herein comprise amino acids (e.g., arginine) incorporated therein through formation of one or more peptide bonds. The amino acids incorporated into the CPP may be referred to a “residue” of such amino acid, or simply as the amino acid. For example, arginine or an arginine residue refers to wherein the N- and C-terminus are attached to other amino acids through a peptide bond.

[0053] As used herein, “insert” or “insertion” means the addition of a CPP sequence into a protein sequence. In embodiments, the CPP sequence is inserted between amino acids in the looped region of a protein without removing or replacing amino acids of the protein, such

11

SUBSTITUTE SHEET (RULE 26) that the resulting protein contains the all of the amino acids in the native protein in addition to the CPP. In such embodiments, CPP insertion increases the total number of amino acids in the protein. In embodiments, the CPP replaces amino acids present in the loop region of a protein, such that resulting protein does not contain all of the amino acids that were present prior to CPP insertion.

[0054] As used herein, “treat,” “treating,” “treatment” and variants thereof, refers to any administration of one or more of the disclosed compounds that partially or completely alleviates, ameliorates, relieves, inhibits, delays onset of, reduces severity of, and/or reduces incidence of one or more symptoms or features of a disease as described herein.

[0055] The terms “inhibit”, “inhibiting” or “inhibition” refer to a decrease in an activity, expression, function or other biological parameter and can include, but does not require complete ablation of the activity, expression, function or other biological parameter. Inhibition can include, for example, at least about a 10% reduction in the activity, response, condition, or disease as compared to a control. In embodiments, expression, activity or function of a gene or protein is decreased by a statistically significant amount.

[0056] As used herein, “therapeutically effective” refers to an amount of a disclosed compound which confers a therapeutic effect on a patient. In embodiments, the therapeutically effective amount is an amount sufficient to treat a disease in a subject in need thereof.

[0057] As used herein, “cell penetrating peptide” or “CPP” refers to any peptide which is capable of penetrating a cell membrane. In embodiments, the CPP is cyclic, and may be represented as “cCPP”. The cyclic cell penetrating peptide is also capable of directing a compound (e.g., nuclease) to penetrate the membrane of a cell. In embodiments where the CPP is conjugated to the nuclease (rather than inserted into a looped region), the CPP is cyclic. In embodiments, the CPP delivers the nuclease to the cytosol of the cell. In embodiments, the CPP delivers the nuclease to the cellular location where the target sequence is located.

[0058] As used herein, the terms “exocyclic peptide” (EP) and “modulatory peptide” (MP) may be used interchangeably to refer to two or more amino acid residues linked by a peptide bond that are attached to the cyclic peptides described herein and alter the tissue distribution and/or retention of the compound. In embodiments, the modulatory peptide comprises at least one positively charged amino acid residue, e.g., at least one lysine residue and/or at least one arginine residue. Non-limiting examples of exocyclic peptides are described

12

SUBSTITUTE SHEET (RULE 26) herein. In embodiments, the exocyclic peptide can comprise a peptide that has been identified in the art as a “nuclear localization signal” (NLS).

[0059] As used herein, the term "nuclear localization sequence" (NLS) refers to an amino acid sequence which induces transport of molecules including such sequences or linked to such sequences into the nucleus of eukaryotic cells. Non-limiting examples of nuclear localization sequences include the nuclear localization sequence of the SV40 virus large T- antigen, the minimal functional unit of which is the seven amino acid sequence PKKKRKV, the nucleoplasmin bipartite NLS with the sequence NLSKRPAAIKKAGQAKKKK, the c-myc nuclear localization sequence comprising the amino acid sequence PAAKRVKLD or RQRRNELKRSF, the sequence

RMRKFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV of the IBB domain from importin-alpha, the sequences VSRKRPRP and PPKKARED of the myoma T protein, the sequence PQPKKKPL of human p53, the sequence SALIKKKKKMAP of mouse c-abl IV, the sequences DRLRR and PKQKKRK of the influenza virus NS 1, the sequence RKLKKKIKKL of the Hepatitis virus delta antigen and the sequence REKKKFLKRR of the mouse Mxl protein, the sequence KRKGDEVDGVDEVAKKKSKK of the human poly(ADP-ribose) polymerase and the sequence RKCLQAGMNLEARKTKK of the steroid hormone receptors (human) glucocorticoid. International Publication No. 2001/038547 describes additional examples of NLSs and is incorporated by reference herein in its entirety.

[0060] As used herein, “linker” or “L” refers to a moiety that covalently bonds two or more moi eties (e.g., a CPP and an nuclease or guide RNA sequence, and optionally an exocyclic peptide). In embodiments, the linker can be natural or non-natural amino acid or polypeptide. In other embodiments, the linker is a synthetic compound containing two or more appropriate functional groups suitable to bind a CPP and a nuclease or guide RNA sequence, to thereby form the constructs disclosed herein. In yet another embodiment, the linker comprises an M moiety to thereby conjugate the CPP to the nuclease or guide RNA sequence. In embodiments, the CPP may be covalently bound to the Cas nuclease via linker.

[0061] As used herein, the term “sequence identity” refers to the percentage of nucleic acids or amino acids between two oligonucleotide or polypeptide sequences, respectively, that are the same and in the same relative position. As such one oligonucleotide or polypeptide sequence has a certain percentage of sequence identity compared to another oligonucleotide or polypeptide sequence, respectively. For sequence comparison, typically one sequence acts as

13

SUBSTITUTE SHEET (RULE 26) a reference sequence, to which test sequences are compared. Those of ordinary skill in the art will appreciate that two sequences are generally considered to be “substantially identical” if they contain identical residues in corresponding positions. In embodiments, the sequence identity between two amino acid sequences may be determined using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet. 16: 276-277), in the version that exists as of the date of filing. The parameters used are gap open penalty of 10, gap extension penalty of 0.5, and the EBLOSUM62 (EMBOSS version of BLOSUM62) substitution matrix. The output of Needle labeled “longest identity” (obtained using the -nobrief option) is used as the percent identity and is calculated as follows: (Identical Residues* 100)/(Length of Alignment-Total Number of Gaps in Alignment)

[0062] In other embodiments, sequence identity may be determined using the Smith- Waterman algorithm, in the version that exists as of the date of filing.

[0063] As used herein, “sequence homology” refers to the percentage of amino acids between two polypeptide sequences, or the percentage of nucleic acids between two oligonucleotide sequences, that are homologous and in the same relative position. As such one polypeptide sequence has a certain percentage of sequence homology compared to another polypeptide sequence. As will be appreciated by those of ordinary skill in the art, two sequences are generally considered to be “substantially homologous” if they contain homologous residues in corresponding positions. Homologous residues may be identical residues. Alternatively, homologous residues may be non-identical residues with appropriately similar structural and/or functional characteristics. For example, as is well known by those of ordinary skill in the art, certain amino acids are typically classified as “hydrophobic” or “hydrophilic” amino acids, and/or as having “polar” or “non-polar” side chains, and substitution of one amino acid for another of the same type may often be considered a “homologous” substitution.

[0064] As is well known in this art, amino acid sequences or nucleic acid sequences may be compared using any of a variety of algorithms, including those available in commercial computer programs such as BLASTP, gapped BLAST, and PSI-BLAST, in existence as of the date of filing. Exemplary such programs are described in Altschul, et al., Basic local alignment search tool, J. Mol. BioL, 215(3): 403-410, 1990; Altschul, et al., Methods in Enzymology, Altschul, et al., “Gapped BLAST and PSI-BLAST : a new generation of protein database search

14

SUBSTITUTE SHEET (RULE 26) programs”, Nucleic Acids Res. 25:3389-3402, 1997; Baxevanis, et al., Bioinformatics A Practical Guide to the Analysis of Genes and Proteins, Wiley, 1998; and Misener, et al., (eds.), Bioinformatics Methods and Protocols (Methods in Molecular Biology, Vol. 132), Humana Press, 1999. In addition to identifying homologous sequences, the programs mentioned above typically provide an indication of the degree of homology.

[0065] As used herein, the terms “targeting” or “targeted to” refer to the association of a nuclease with a target nucleic acid molecule or a region of a target nucleic acid molecule. In embodiments, the nuclease is associated with a guide RNA (gRNA) that is capable of hybridizing to a target nucleic acid under physiological conditions. In embodiments, the nuclease targets a specific portion or site within the target nucleic acid, for example, a portion of the target nucleic acid comprising at least one protospacer adjacent motif (PAM) sequence or region.

[0066] As used herein, the terms "target nucleic acid" and "target sequence" refer to a nucleic acid molecule comprising a nucleic acid sequence to which the construct binds or hybridizes. Target nucleic acids include, but are not limited to, RNA (including, but not limited to pre-mRNA and mRNA or portions thereof), DNA, including, for example, genomic DNA or cDNA, as well as non-translated RNA, such as miRNA. In embodiments, a target nucleic acid can be a cellular gene (or mRNA transcribed from such gene) whose expression is associated with a particular disorder or disease state, or a nucleic acid molecule from an infectious agent. In embodiments, the target nucleic acid is RNA. In embodiments, the target nucleic acid is DNA. In embodiments, the target nucleic acid is mRNA. In embodiments, the target nucleic acid is pre-mRNA.

[0067] As used herein, the term “mRNA” refers to an RNA molecule that encodes a protein and comprises pre-mRNA or mature mRNA. "Pre-mRNA" refers to a newly synthesized eukaryotic mRNA molecule directly after DNA transcription. In embodiments, a pre-mRNA is capped with a 5' cap, modified with a 3' poly-A tail, and/or spliced to produce a mature mRNA sequence. In embodiments, pre-mRNA comprises one or more introns. In one embodiment, the pre-mRNA undergoes a process known as splicing to remove introns and join exons. In embodiments, pre-mRNA comprises a polyadenylation site.

[0068] “Target nucleic acid sequence” refers to a nucleic acid sequence to which the gRNA of a CRISPR-Cas ribonucleoprotein complex (RNP) hybridizes. In embodiments, the target nucleic acid sequence is a DNA sequence. In embodiments, the target nucleic acid 15

SUBSTITUTE SHEET (RULE 26) sequence is an RNA sequence. In embodiments, the target nucleic acid sequence is an mRNA sequence. In embodiments, the target nucleic acid sequence is a pre-mRNA sequence. In embodiments, the target nucleic acid sequence is a mature mRNA sequence.

[0069] As used herein, the term "gene" refers to a nucleic acid molecule comprising a nucleic acid sequence that encompasses a 5' promoter region associated with the expression of the gene product, and any intron and exon regions and 3' untranslated regions ("UTR") associated with the expression of the gene product.

[0070] The term "target gene" refers to a gene that includes a nucleic acid sequence to which the gRNA of a CRISPR-Cas ribonucleoprotein complex (RNP) hybridizes or that encodes a target mRNA, for example, a target pre-mRNA or a mature target mRNA.

[0071] The "target protein" refers to the amino acid sequence encoded by the target gene or target mRNA. In embodiments, the target protein may have aberrant or reduced activity, or may not be a functional protein.

[0072] “Alkyl”, “alkyl group” or “alkyl chain” refers to a fully saturated, straight or branched hydrocarbon chain having from one to twelve carbon atoms, and which is attached to the rest of the molecule by a single bond. Alkyls comprising any number of carbon atoms from 1 to 12 are included. An alkyl comprising up to 12 carbon atoms is a C1-C12 alkyl, an alkyl comprising up to 10 carbon atoms is a C1-C10 alkyl, an alkyl comprising up to 6 carbon atoms is a Ci-Ce alkyl and an alkyl comprising up to 5 carbon atoms is a C1-C5 alkyl. A C1-C5 alkyl includes C5 alkyls, C4 alkyls, C3 alkyls, C2 alkyls and Ci alkyl (i.e., methyl). A Ci-Ce alkyl includes all moieties described above for C1-C5 alkyls but also includes Ce alkyls. A C1-C10 alkyl includes all moieties described above for C1-C5 alkyls and Ci-Ce alkyls, but also includes C7, Cs, C9 and C10 alkyls. Similarly, a C1-C12 alkyl includes all the foregoing moieties, but also includes C11 and C12 alkyls. Non-limiting examples of C1-C12 alkyl include methyl, ethyl, n- propyl, z-propyl, ec-propyl, zz-butyl, z-butyl, sec-butyl, /-butyl, zz-pentyl, Z-amyl, zz-hexyl, n- heptyl, zz-octyl, zz-nonyl, zz-decyl, zz-undecyl, and zz-dodecyl. Unless stated otherwise specifically in the specification, an alkyl group can be optionally substituted.

[0073] “Alkylene”, “alkylene group” or “alkylene chain” refer to a fully saturated, straight or branched divalent or trivalent hydrocarbon chain radical, having from one to forty carbon atoms. Non-limiting examples of C2-C40 alkylene include ethylene, propylene, zz-butylene, ethenylene, propenylene, zz-butenylene, propynylene, zz-butynylene, and the like. The alkylene chain is attached, directly or indirectly, to the CPP through a single bond and, 16

SUBSTITUTE SHEET (RULE 26) directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless stated otherwise specifically in the specification, an alkylene chain can be optionally substituted as described herein.

[0074] “Alkenylene”, “alkenylene group”, or “alkenylene chain” refer to a straight or branched divalent or trivalent hydrocarbon chain radical, having from two to forty carbon atoms, and having one or more carbon-carbon double bonds. Non-limiting examples of C2-C40 alkenylene include ethene, propene, butene, and the like. The alkenylene chain is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless stated otherwise specifically in the specification, an alkenylene chain can be optionally substituted.

[0075] “Alkynyl”, “alkynyl group” or “alkenyl chain” refer to a straight or branched hydrocarbon chain having from two to twelve carbon atoms and having one or more carboncarbon triple bonds. Each alkynyl group is attached to the rest of the molecule by a single bond. Alkynyl group comprising any number of carbon atoms from 2 to 12 are included. An alkynyl group comprising up to 12 carbon atoms is a C2-C12 alkynyl, an alkynyl comprising up to 10 carbon atoms is a C2-C10 alkynyl, an alkynyl group comprising up to 6 carbon atoms is a C2- Ce alkynyl and an alkynyl comprising up to 5 carbon atoms is a C2-C5 alkynyl. A C2-C5 alkynyl includes C5 alkynyls, C4 alkynyls, C3 alkynyls, and C2 alkynyls. A C2-C6 alkynyl includes all moieties described above for C2-C5 alkynyls but also includes Ce alkynyls. A C2-C10 alkynyl includes all moieties described above for C2-C5 alkynyls and C2-C6 alkynyls, but also includes C7, Cs, C9 and C10 alkynyls. Similarly, a C2-C12 alkynyl includes all the foregoing moieties, but also includes C11 and C12 alkynyls. Non-limiting examples of C2-C12 alkenyl include ethynyl, propynyl, butynyl, pentynyl and the like. Unless stated otherwise specifically in the specification, an alkyl group can be optionally substituted.

[0076] “Alkynylene”, “alkynylene group” or “alkynylene chain” refers to a straight or branched divalent or trivalent hydrocarbon chain, having from two to forty carbon atoms, and having one or more carbon-carbon triple bonds. Non-limiting examples of C2-C40 alkynylene include ethynylene, propargylene and the like. The alkynylene chain is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless stated otherwise specifically in the specification, an alkynylene chain can be optionally substituted.

17

SUBSTITUTE SHEET (RULE 26) [0077] “Carbocyclyl,” “carbocyclic ring” or “carbocycle” refers to a rings structure, wherein the atoms which form the ring are each carbon, and which is attached to the rest of the molecule by a single bond. Carbocyclic rings can comprise from 3 to 20 carbon atoms in the ring. Unless stated otherwise specifically in the specification, the carbocyclyl can be a monocyclic, bicyclic, tricyclic or tetracyclic ring system, which can include fused or bridged ring systems Carbocyclic rings include aryls and cycloalkyl, cycloalkenyl, and cycloalkynyl as defined herein. Unless stated otherwise specifically in the specification, a carbocyclyl group can be optionally substituted. In embodiments, the carbocyclyl divalent, and is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless stated otherwise specifically in the specification, a heterocyclyl group can be optionally substituted.

[0078] “Cycloalkyl” refers to a stable non-aromatic monocyclic or polycyclic fully saturated hydrocarbon having from 3 to 40 carbon atoms and at least one ring, wherein the ring consists solely of carbon and hydrogen atoms, which can include fused or bridged ring systems. Monocyclic cycloalkyls include, for example, cyclopropyl, cyclobutyl, cyclopentyl, cyclohexyl, cycloheptyl, and cyclooctyl. Polycyclic cycloalkyls include, for example, adamantyl, norbomyl, decalinyl, 7,7-dimethyl-bicyclo[2.2.1]heptanyl, and the like. In embodiments, the cycloalkyl divalent and is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless otherwise stated specifically in the specification, a cycloalkyl group can be optionally substituted.

[0079] “Cycloalkenyl” refers to a stable non-aromatic monocyclic or polycyclic hydrocarbon having from 3 to 40 carbon atoms, at least one ring having, and one or more carbon-carbon double bonds, wherein the ring consists solely of carbon and hydrogen atoms, which can include fused or bridged ring systems. Monocyclic cycloalkenyls include, for example, cyclopentenyl, cyclohexenyl, cycloheptenyl, cycloctenyl, and the like. Polycyclic cycloalkenyl radicals include, for example, bicyclo[2.2.1]hept-2-enyl and the like. In embodiments, cycloalkenyl is divalent and is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless otherwise stated specifically in the specification, a cycloalkenyl group can be optionally substituted.

18

SUBSTITUTE SHEET (RULE 26) [0080] “Cycloalkynyl” refers to a stable non-aromatic monocyclic or polycyclic hydrocarbon having from 3 to 40 carbon atoms, at least one ring having, and one or more carbon-carbon triple bonds, wherein the ring consists solely of carbon and hydrogen atoms, which can include fused or bridged ring systems. Monocyclic cycloalkynyls include, for example, cycloheptynyl, cyclooctynyl, and the like. The cycloalkynyl is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless otherwise stated specifically in the specification, a cycloalkynyl group can be optionally substituted.

[0081] “Aryl” refers to a hydrocarbon ring system comprising hydrogen, 6 to 40 carbon atoms and at least one aromatic ring. For purposes of this disclosure, the aryl can be a monocyclic, bicyclic, tricyclic or tetracyclic ring system, which can include fused or bridged ring systems. Aryls include, but are not limited to, aryl divalent radicals derived from aceanthrylene, acenaphthylene, acephenanthrylene, anthracene, azulene, benzene, chrysene, fluoranthene, fluorene, as-indacene, -indacene, indane, indene, naphthalene, phenalene, phenanthrene, pleiadene, pyrene, and triphenylene. In embodiments, the aryl divalent and is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless stated otherwise specifically in the specification, an aryl group can be optionally substituted.

[0082] “Heterocyclyl,” “heterocyclic ring” or “heterocycle” refers to a stable 3- to 22-membered ring system which includes two to fourteen carbon atoms and from one to eight heteroatoms selected from nitrogen, oxygen and sulfur. Heterocyclyl or heterocyclic rings include heteroaryls as defined below. Unless stated otherwise specifically in the specification, the heterocyclyl can be a monocyclic, bicyclic, tricyclic or tetracyclic ring system, which can include fused or bridged ring systems; and the nitrogen, carbon or sulfur atoms in the heterocyclyl can be optionally oxidized; the nitrogen atom can be optionally quaternized; and the heterocyclyl can be partially or fully saturated. Examples of such heterocyclyl radicals include, but are not limited to, dioxolanyl, thienyl[l,3]dithianyl, decahydroisoquinolyl, imidazolinyl, imidazolidinyl, isothiazolidinyl, isoxazolidinyl, morpholinyl, octahydroindolyl, octahydroisoindolyl, 2-oxopiperazinyl, 2-oxopiperidinyl, 2-oxopyrrolidinyl, oxazolidinyl, piperidinyl, piperazinyl, 4-piperidonyl, pyrrolidinyl, succinimidyl, pyrazolidinyl, quinuclidinyl, thiazolidinyl, tetrahydrofuryl, trithianyl, tetrahydropyranyl, thiomorpholinyl,

19

SUBSTITUTE SHEET (RULE 26) thiamorpholinyl, 1-oxo-thiomorpholinyl, and 1,1-dioxo-thiomorpholinyl. In embodiments, the heterocyclyl is divalent and is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless stated otherwise specifically in the specification, a heterocyclyl group can be optionally substituted.

[0083] “Heteroaryl” refers to a 5- to 22-membered aromatic ring comprising hydrogen atoms, one to fourteen carbon atoms, one to eight heteroatoms selected from nitrogen, oxygen and sulfur, and at least one aromatic ring. For purposes of this disclosure, the heteroaryl can be a monocyclic, bicyclic, tricyclic or tetracyclic ring system, which can include fused or bridged ring systems; and the nitrogen, carbon or sulfur atoms in the heteroaryl can be optionally oxidized; the nitrogen atom can be optionally quatemized. Examples include, but are not limited to, azepinyl, acridinyl, benzimidazolyl, benzothiazolyl, benzindolyl, benzodioxolyl, benzofuranyl, benzooxazolyl, benzothiazolyl, benzothiadiazolyl, benzo[Z>][l,4]dioxepinyl, 1,4-benzodioxanyl, benzonaphthofuranyl, benzoxazolyl, benzodioxolyl, benzodioxinyl, benzopyranyl, benzopyranonyl, benzofuranyl, benzofuranonyl, benzothienyl (benzothiophenyl), benzotriazolyl, benzo[4,6]imidazo[l,2-a]pyridinyl, carbazolyl, cinnolinyl, dibenzofuranyl, dibenzothiophenyl, furanyl, furanonyl, isothiazolyl, imidazolyl, indazolyl, indolyl, indazolyl, isoindolyl, indolinyl, isoindolinyl, isoquinolyl, indolizinyl, isoxazolyl, naphthyridinyl, oxadiazolyl, 2-oxoazepinyl, oxazolyl, oxiranyl, 1-oxidopyridinyl, 1-oxidopyrimidinyl, 1-oxidopyrazinyl, 1-oxidopyridazinyl, 1 -phenyl- 177-pyrrolyl, phenazinyl, phenothiazinyl, phenoxazinyl, phthalazinyl, pteridinyl, purinyl, pyrrolyl, pyrazolyl, pyridinyl, pyrazinyl, pyrimidinyl, pyridazinyl, quinazolinyl, quinoxalinyl, quinolinyl, quinuclidinyl, isoquinolinyl, tetrahydroquinolinyl, thiazolyl, thiadiazolyl, triazolyl, tetrazolyl, triazinyl, and thiophenyl (i.e. thienyl). In embodiments, the heteroaryl is divalent and is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single bond). Unless stated otherwise specifically in the specification, a heteroaryl group can be optionally substituted.

[0084] The term “ether” used herein refers to a divalent moiety having a formula - [(Rl)m -O- R.2)n]z- wherein each of m, n, and z are independently an integer from 1 to 40, and Ri and R2 are independently an alkylene. Examples include polyethylene glycol. The ether is attached, directly or indirectly, to the CPP through a single bond and, directly or indirectly, to the nuclease through a single bond (and optionally to the exocyclic peptide through a single

20

SUBSTITUTE SHEET (RULE 26) bond). Unless stated otherwise specifically in the specification, the ether can be optionally substituted.

[0085] The term “substituted” used herein means any of the above groups (i.e., alkylene, alkenylene, alkynylene, aryl, carbocyclyl, cycloalkyl, cycloalkenyl, cycloalkynyl, heterocyclyl, heteroaryl, and/or ether) wherein at least one hydrogen atom is replaced by a bond to a non-hydrogen atoms such as, but not limited to: a deuterium atom; a halogen atom such as F, Cl, Br, and I; an oxygen atom in groups such as hydroxyl groups, alkoxy groups, and ester groups; a sulfur atom in groups such as thiol groups, thioalkyl groups, sulfone groups, sulfonyl groups, and sulfoxide groups; a nitrogen atom in groups such as amines, amides, alkylamines, dialkylamines, arylamines, alkylarylamines, diarylamines, N-oxides, imides, and enamines; a silicon atom in groups such as trialkylsilyl groups, dialkylarylsilyl groups, alkyldiarylsilyl groups, and triarylsilyl groups; and other heteroatoms in various other groups. “Substituted” also means any of the above groups in which one or more hydrogen atoms are replaced by a higher-order bond (e.g., a double- or triple-bond) to a heteroatom such as oxygen in oxo, carbonyl, carboxyl, and ester groups; and nitrogen in groups such as imines, oximes, hydrazones, and nitriles. For example, “substituted” includes any of the above groups in which one or more hydrogen atoms are replaced with -NRgRh, -NR g C(=O)Rh, -NR g C(=O)NR g Rh, -NR g C(=O)ORh, -NRgSCbRh, -OC(=O)NR g Rh, -ORg, -SRg, -SORg, -SChRg, -OSChRg, -SChORg, =NSO2Rg, and -SO2NR.gR.i1. “Substituted also means any of the above groups in which one or more hydrogen atoms are replaced with -C(=O)R g , -C(=O)OR g , -C(=O)NR g Rh, -CH 2 SO 2 Rg, -CFbSChNRgRh. In the foregoing, R g and Rh are the same or different and independently hydrogen, alkyl, alkenyl, alkynyl, alkoxy, alkylamino, thioalkyl, aryl, aralkyl, cycloalkyl, cycloalkenyl, cycloalkynyl, cycloalkylalkyl, haloalkyl, haloalkenyl, haloalkynyl, heterocyclyl, /'/-heterocyclyl, heterocyclylalkyl, heteroaryl, /'/-heteroaryl and/or heteroarylalkyl. “Substituted” further means any of the above groups in which one or more hydrogen atoms are replaced by a bond to an amino, cyano, hydroxyl, imino, nitro, oxo, thioxo, halo, alkyl, alkenyl, alkynyl, alkoxy, alkylamino, thioalkyl, aryl, aralkyl, cycloalkyl, cycloalkenyl, cycloalkynyl, cycloalkylalkyl, haloalkyl, haloalkenyl, haloalkynyl, heterocyclyl, /'/-heterocyclyl, heterocyclylalkyl, heteroaryl, /'/-heteroaryl and/or heteroarylalkyl group. In addition, each of the foregoing substituents can also be optionally substituted with one or more of the above substituents. Further, those skilled in the art will recognize that “substituted” also encompasses instances in which one or more hydrogen atoms on any of the above groups are replaced by a substituent listed in this paragraph, and the 21

SUBSTITUTE SHEET (RULE 26) substituent then forms a covalent bond with the CPP or nuclease. The resulting bonding group can be considered a “substituent.” For example, In embodiments, any of the above groups can be substituted at a first position with a carboxylic acid (i.e., -C(=O)OH) which forms an amide bond with an appropriate amino acid CPP (e.g., lysine), and also substituted at a second position with either an electrophilic group (e.g., -C(=O)H, -CChRg, -halide, etc.) or a nucleophilic group (-NH2, -NHR g , -OH, etc.) which forms a bond with the nuclease (either at the N-terminus, C- terminus, or a side chain of an amino acid), or the guide RNA sequence (either at the 5' end, the 3' end, or the backbone). The resulting bond, e.g., amide bond, can be considered a “substituent.” In embodiments, the second position is substituted with a thiol group which forms a disulfide bond with a -SH group on the nuclease. The resulting disulfide is encompassed by the term substituent.

[0086] As used herein, the symbol “ ” (hereinafter can be referred to as “a point of attachment bond”) denotes a bond that is a point of attachment between two chemical entities, one of which is depicted as being attached to the point of attachment bond and the other of which is not depicted as being attached to the point of attachment bond. For example,

“ ” indicates that the chemical entity “XY” is bonded to another chemical entity via the point of attachment bond. Furthermore, the specific point of attachment to the non-depicted chemical entity can be specified by inference. For example, the compound CH3-R 3 , wherein R 3 is H or “ ” infers that when R 3 is “XY”, the point of attachment bond is the same bond as the bond by which R 3 is depicted as being bonded to CH3.

Cell-penetrating peptides

[0087] In embodiments, the present disclosure provides constructs that include at least one cell penetrating peptide (CPP) sequence and at least one component of a CRISPR-Cas gene editing system. In embodiments, the construct includes one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) components of a CRISPR-Cas gene editing system. In embodiments, the construct includes one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more cell penetrating peptides (CPPs).

[0088] In embodiments, the present disclosure provides nucleases comprising at least one cell penetrating peptide (CPP) sequence. In embodiments, a nuclease is provided in which one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CPP may be inserted into the nuclease. In 22

SUBSTITUTE SHEET (RULE 26) embodiments, constructs are provided in which one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CPP may be conjugated to a nuclease. In embodiments, constructs are provided in which one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CPP may be conjugated to a gRNA. In embodiments, constructs are provided in which one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CPP may be conjugated to the a CRISPR-Cas ribonucleoprotein complex (RNP).

[0089] In embodiments, the CPP is inserted at any suitable location in the nuclease, such as the N- or C-terminus, or between the N- and C-terminus. In embodiments, the nuclease comprises at least one loop region, and the CPP is inserted in the at least one loop region. The nuclease can contain any number of loops and any number of CPP sequences. One skilled in the art will recognize that the suitable loops for CPP insertion are those in which CPP insertion does not abolish the desired activity of the protein. Methods for determining the impact of CPP insertion on protein activity are known in the art (see, for example, the methods described herein). In embodiments, the nuclease comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more loops, and 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 CPP sequences are inserted into the loop(s). In embodiments, the CPP is inserted into at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90% and up to about 50%, up to about 60%, up to about 70%, up to about 80%, up to about 90% or up to about 100%of the loop regions in the nuclease.

[0090] In embodiments, the present disclosure provides constructs are provided that include one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) nucleases. In embodiments, constructs are provided that include one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) gRNA. In embodiments, constructs are provided that include one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CRISPR-Cas ribonucleoprotein complexes (RNP).

[0091] In embodiments, (in addition to CPP insertion, or in the alternative to CPP insertion), one or more (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CPP may be conjugated to a nuclease, a guide RNA (gRNA), a ribonucleoprotein complex (RNP) or a combination thereof. Conjugation of the CPP to the nuclease, gRNA or RNP may occur at any suitable location, such as the N- or C-terminus, or a side chain of an amino acid in the nuclease with a suitable functional group, at the 5’ or 3’ end of the gRNA, or on the backbone of the gRNA. In embodiments where the CPP is conjugated to the nuclease, the gRNA or the RNP, the CPP may be cyclic.

23

SUBSTITUTE SHEET (RULE 26) [0092] The CPP may be or include any amino acid sequence which facilitates cellular uptake of the modified looped proteins (e.g., nucleases) disclosed herein. Suitable CPPs can include naturally occurring sequences, modified sequences, and synthetic sequences, and linear or cyclic sequences, which facilitate uptake of a looped nuclease. Non-limiting examples of linear CPPs include Polyarginine (e.g., R9 or R11), Antennapedia sequences, HIV-TAT, Penetratin, Antp-3A (Antp mutant), Buforin II. Transportan, MAP (model amphipathic peptide), K-FGF, Ku70, Prion, pVEC, Pep-1, SynBl, Pep-7, HN-1, BGSC (Bis-Guanidinium- Spermidine-Cholesterol, and BGTC (Bis-Guanidinium-Tren-Cholesterol).

[0093] In embodiments, the total number of amino acids in the CPP may be in the range of from about 4 to about 20 amino acids, e.g., about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19 or about 20 amino acids, inclusive of all ranges and subranges therebetween. In embodiments, the CPPs disclosed herein comprise about 4 to about 13 amino acids. In embodiments, the CPPs disclosed herein comprise about 6 to about 10 amino acids, or about 6 to about 8 amino acids.

[0094] Each amino acid in the CPP may be a natural or non-natural amino acid. The term “non-natural amino acid” refers to an organic compound that is a congener of a natural amino acid in that it has an amine (-NH2) group on one end and a carboxylic acid (- COOH) group on the other end but the side chain or backbone is modified. The resulting moiety has a structure and reactivity that is similar but not identical to a natural amino acid. Non-limiting examples of such modifications include elongation or truncation of the side chain by one or more methylene groups, replacing one atom with another, and increasing the size of an aromatic ring. The non-natural amino acid can be a modified amino acid, and/or amino acid analog, that is not one of the 20 common naturally occurring amino acids or the rare natural amino acids selenocysteine or pyrrolysine. For example, an analog of arginine may have one more or one fewer methylene groups on the side chain. Non-natural amino acids can also be the D-isomer of the natural amino acids. Examples of suitable amino acids include, but are not limited to, alanine, alloisoleucine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, naphthylalanine, phenylalanine, proline, pyroglutamic acid, serine, threonine, tryptophan, tyrosine, valine, a derivative, or combinations thereof. These, and others, are listed in the Table A along with their abbreviations used herein.

24

SUBSTITUTE SHEET (RULE 26) Table A: Amino Acid Abbreviations

25

SUBSTITUTE SHEET (RULE 26)

[0095] In embodiments, the CPP comprises at least 2 arginine residues, or analogs thereof, e.g., 2, 3, 4, 5, 6, 7, 8, 9, or 10. In embodiments, the CPP comprises 2-6 arginine residues, or analogs thereof. In embodiments, the CPP comprises at least 3 arginine residues, or analogs thereof, e.g., 3, 4, 5, 6, 7, 8, 9, or 10. In embodiments, the CPP comprises from 3-6 arginine residues, or analogs thereof. In embodiments, the CPP comprises at least 2 arginine residues, e.g., 3, 4, 5, 6, 7, 8, 9, or 10. In embodiments, the CPP comprises 3-6 arginine residues, e.g., 3, 4, 5, 6, 7, 8, 9, or 10.

[0096] In embodiments, the CPP comprises at least one amino acid residue with a hydrophobic side chain, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acid residues with a hydrophobic side chain. In embodiments, the CPP comprises from 1-6 amino acid residues with a hydrophobic side chain.

[0097] Amino acids having higher hydrophobicity values can be selected for inclusion in the CPP sequence to improve cytosolic delivery efficiency of the modified proteins relative to CPP sequences comprising amino acids having a lower hydrophobicity value. In embodiments, each hydrophobic amino acid (also referred to herein as an amino acid having a hydrophobic side chain) independently has a hydrophobicity value which is greater than that of glycine. In other embodiments, each hydrophobic amino acid independently has a hydrophobicity value which is greater than that of alanine. In still other embodiments, each hydrophobic amino acid independently has a hydrophobicity value which is greater or equal to phenylalanine. Hydrophobicity may be measured using hydrophobicity scales known in the art.

26

SUBSTITUTE SHEET (RULE 26) Table B below lists hydrophobicity values for various amino acids as reported by Eisenberg and Weiss (Proc. Natl. Acad. Sci. U. S. A. 1984;81(1): 140-144), Engleman, et al. (Ann. Rev. of Biophys. Biophys. Chem.. 1986;(15):321— 53), Kyte and Doolittle (J. Mol. Biol. 1982;157(1): 105— 132), Hoop and Woods (Proc. Natl. Acad. Sci. U.S.A. 1981;78(6):3824-3828), and Janin (Nature. 1979;277(5696):491-492), the entirety of each of which is herein incorporated by reference in its entirety. In embodiments, hydrophobicity is measured using the hydrophobicity scale reported in Engleman, et al.

Table B: Amino acid hydrophobicity values

27

SUBSTITUTE SHEET (RULE 26) [0098] In embodiments, the CPP sequence comprises 4, 5, 6, 7, 8, 9, 10, or more amino acid residues. In embodiments, the CPP sequence comprises from 1-6 D-amino acids. In embodiments, the chirality of the amino acids can be selected to improve cytosolic uptake efficiency. In embodiments, at least two of the amino acids have the opposite chirality. In embodiments, the at least two amino acids having the opposite chirality can be adjacent to each other. In embodiments, at least three amino acids have alternating stereochemistry relative to each other. In embodiments, the at least three amino acids having the alternating chirality relative to each other can be adjacent to each other. In embodiments, at least two of the amino acids have the same chirality. In embodiments, the at least two amino acids having the same chirality can be adjacent to each other. In embodiments, at least two amino acids have the same chirality and at least two amino acids have the opposite chirality. In embodiments, the at least two amino acids having the opposite chirality can be adjacent to the at least two amino acids having the same chirality. Accordingly, In embodiments, adjacent amino acids in the CPP can have any of the following sequences: D-L; L-D; D-L-L-D; L-D-D-L; L-D-L-L-D; D-L-D-D- L; D-L-L-D-L; or L-D-D-L-D. Methods of incorporating D amino acids in the CPP sequence during protein synthesis are known in the art, see e.g, Huang et al., Toward D-peptide biosynthesis: Elongation Factor P enables ribosomal incorporation of consecutive D-amino acids. (2017) bioRxiv 125930; doi: https://doi.org/10.1101/125930; Katoh et al., Consecutive elongation of D-amino acids in translation. (2017) Cell Chemical Biology 24:46-54. Proteins containing non-natural amino acids may be producing using native chemical ligation, see e.g., Bondalapati, et al., Expanding the chemical toolbox for the synthesis of large and uniquely modified proteins. (2016) Nature Chemistry volume 8, pages 407-418; Amy E. Rabideau and Bradley Lether Pentelute*. Delivery of Non-Native Cargo into Mammalian Cells Using Anthrax Lethal Toxin. ACS Chem. (2016) Biol., 11(6) 1490-1501; and Weidmann et al., Copying Life: Synthesis of an Enzymatically Active Mirror-Image DNA-Ligase Made of D- Amino Acids. Cell Chemical Biology, (2019 May 16) 26(5); 616-619.

[0099] In embodiments, the hydrophobic amino acid residue comprises an aryl or heteroaryl group, each of which is optionally substituted. In embodiments, the hydrophobic amino acid residue comprises an alkyl, alkenyl, or alkynyl side chain, each of which is optionally substituted.

[0100] In embodiments, each amino acid residue comprising a hydrophobic side chain is independently selected from a residue of glycine, alanine, valine, leucine, isoleucine,

28

SUBSTITUTE SHEET (RULE 26) methionine, phenylalanine, tryptophan, proline, naphthylalanine, phenylglycine, homophenylalanine, tyrosine, cyclohexylalanine, piperidine-2-carboxylic acid, cyclohexylalanine, norleucine, 3-(3-benzothienyl)-alanine, 3-(2-quinolyl)-alanine, O- benzylserine, 3-(4-(benzyloxy)phenyl)-alanine, S-(4-methylbenzyl)cysteine, 7V-(naphthalen-2- yl)glutamine, 3-(l,l'-biphenyl-4-yl)-alanine, tert-leucine, or nicotinoyl lysine, each of which is optionally substituted with one or more substituents. The structure of some of these amino acids are provided below. One skilled in the art will recognize that the bond to the hydrogen on the N- and C- terminus of the structures drawn below will be replaced by a bond to an amino acid when incorporated into a CPP or loop region of nuclease as described herein. In embodiments, at least one of the amino acid residues comprising a hydrophobic side chain is a residue of tryptophan or phenylalanine. In embodiments, at least one of the amino acid residues comprising a hydrophobic side chain is a tryptophan residue. In embodiments, at least one of the amino acid residues comprising a hydrophobic side chain is a phenylalanine residue. In embodiments, each of the at least one of the amino acid residues comprising a hydrophobic side chain is a tryptophan residue. The structures of certain of these non-natural aromatic hydrophobic amino acids (prior to incorporation into the peptides disclosed herein) are provided below. In embodiments, each hydrophobic amino acid is independently a hydrophobic aromatic amino acid. In embodiments, the aromatic hydrophobic amino acid is naphthylalanine, 3-(3-benzothienyl)-alanine, phenylglycine, homophenylalanine, phenylalanine, tryptophan, or tyrosine, each of which is optionally substituted with one or more substituents. In embodiments, each hydrophobic amino acid is tryptophan or phenylalanine. In embodiments, each hydrophobic amino acid is tryptophan. In embodiments, the hydrophobic amino acid is tryptophan when the CPP sequence is inserted into the nuclease. In embodiments, each hydrophobic amino acid is phenylalanine.

[0101] In embodiments, the aromatic hydrophobic amino acid is:

29

SUBSTITUTE SHEET (RULE 26)

3-(2-quinolyl)-alanine O-benzylserine 3-(4-(benzyloxy)phenyl)-alanine

S-(4-methylbenzyl)cysteine A/ 5 -(naphthalen-2-yl)glutamine 3-(1 ,1 '-biphenyl-4-yl)-alanine

3-(3-benzothienyl)-alanine

[0102] The optional substituent can be any atom or group which does not significantly reduce (e.g., by more than 50%) the cytosolic delivery efficiency of the CPP, e.g., compared to an otherwise identical sequence which does not have the substituent. In embodiments, the optional substituent can be a hydrophobic substituent or a hydrophilic substituent. In embodiments, the optional substituent is a hydrophobic substituent. In embodiments, the substituent increases the solvent-accessible surface area (as defined herein) of the hydrophobic amino acid. In embodiments, the substituent can be a halogen, alkyl, alkenyl, alkynyl, cycloalkyl, cycloalkenyl, cycloalkynyl, heterocyclyl, aryl, heteroaryl, alkoxy, aryloxy, acyl, alkylcarbamoyl, alkylcarboxamidyl, alkoxycarbonyl, alkylthio, or arylthio. In embodiments, the substituent is a halogen.

[0103] The size of the hydrophobic amino acid may be selected to improve cytosolic delivery efficiency of the CPP. For example, a larger hydrophobic amino acid may improve cytosolic delivery efficiency compared to an otherwise identical sequence having a smaller hydrophobic amino acid. The size of the hydrophobic amino acid can be measured in terms of molecular weight of the hydrophobic amino acid, the steric effects of the hydrophobic amino acid, the solvent-accessible surface area (SASA) of the side chain, or combinations thereof. In 30

SUBSTITUTE SHEET (RULE 26) embodiments, the size of the hydrophobic amino acid is measured in terms of the molecular weight of the hydrophobic amino acid, and the larger hydrophobic amino acid has a side chain with a molecular weight of at least about 90 g/mol, or at least about 130 g/mol, or at least about 141 g/mol. In embodiments, the size of the amino acid is measured in terms of the SASA of the hydrophobic side chain, and the larger hydrophobic amino acid has a side chain with a SASA greater than alanine, or greater than glycine. In embodiments, the hydrophobic amino acid(s) have a hydrophobic side chain with a SASA greater than or equal to about piperidine- 2-carboxylic acid, greater than or equal to about tryptophan, greater than or equal to about phenylalanine, or equal to or greater than about naphthylalanine. In embodiments, the hydrophobic amino acid(s) have a side chain side with a SASA of at least about 200 A 2 , at least about 210 A 2 , at least about 220 A 2 , at least about 240 A 2 , at least about 250 A 2 , at least about 260 A 2 , at least about 270 A 2 , at least about 280 A 2 , at least about 290 A 2 , at least about 300 A 2 , at least about 310 A 2 , at least about 320 A 2 , at least about 330 A 2 , at least about 350 A 2 , at least about 360 A 2 , at least about 370 A 2 , at least about 380 A 2 , at least about 390 A 2 , at least about 400 A 2 , at least about 410 A 2 , at least about 420 A 2 , at least about 430 A 2 , at least about 440 A 2 , at least about 450 A 2 , at least about 460 A 2 , at least about 470 A 2 , at least about 480 A 2 , at least about 490 A 2 , greater than about 500 A 2 , at least about 510 A 2 , at least about 520 A 2 , at least about 530 A 2 , at least about 540 A 2 , at least about 550 A 2 , at least about 560 A 2 , at least about 570 A 2 , at least about 580 A 2 , at least about 590 A 2 , at least about 600 A 2 , at least about 610 A 2 , at least about 620 A 2 , at least about 630 A 2 , at least about 640 A 2 , greater than about 650 A 2 , at least about 660 A 2 , at least about 670 A 2 , at least about 680 A 2 , at least about 690 A 2 , or at least about 700 A 2 .

[0104] As used herein, “hydrophobic surface area” or “SASA” refers to the surface area (reported as square Angstroms; A 2 ) of an amino acid side chain that is accessible to a solvent. In embodiments, SASA is calculated using the 'rolling ball' algorithm developed by Shrake & Rupley (J Mol Biol. 79 (2): 351-71), which is herein incorporated by reference in its entirety for all purposes. This algorithm uses a “sphere” of solvent of a particular radius to probe the surface of the molecule. A typical value of the sphere is 1.4 A, which approximates to the radius of a water molecule.

[0105] SASA values for certain side chains are shown below in Table C. In certain embodiments, the SASA values described herein are based on the theoretical values listed in Table C below, as reported by Tien, et al. (PLOS ONE 8(11): e80635.

31

SUBSTITUTE SHEET (RULE 26) doi.org/104371/journal. pone.0080635, which is herein incorporated by reference in its entirety for all purposes.

Table C. SASA values for amino acid side chains

[0106] In embodiments, the CPPs described herein comprise at least two or at least three arginine residues. In embodiments, the CPPs described herein comprise at least one, two, or three amino acid residues independently having a hydrophobic side chain. In embodiments, the CPP sequence described herein comprises at least three arginine residues and at least three tryptophan residues.

[0107] In embodiments, the at least three arginines and the at least three amino acids having a hydrophobic side chain together constitute a CPP and may be inserted into one loop 32

SUBSTITUTE SHEET (RULE 26) of a nuclease. When the nuclease has more than one looped region, a CPP may be inserted into more than one looped region. In embodiments, a CPP with at least three arginines is inserted into a first loop. In embodiments, the at least three arginines are considered a CPP. In embodiments, the at least three amino acids with a hydrophobic side chain are inserted into a second loop. In embodiments, the at least three hydrophobic amino acids are considered a CPP.

[0108] In embodiments, the CPPs may include any combination of at least three arginines and at least one, two, or three hydrophobic amino acids described herein. In embodiments, the CPPs described herein comprise at least three arginines and at least three hydrophobic amino acids described herein. In embodiments, the CPPs described herein comprise at least three arginines and at least four hydrophobic amino acid residues described herein. In embodiments, the CPPs described herein comprise at least four arginines and at least three hydrophobic amino acids described herein. In embodiments, the CPPs described herein comprise at least four arginines and at least four hydrophobic amino acids described herein.

[0109] In embodiments, an arginine is adjacent to a hydrophobic amino acid. In embodiments, the arginine residue has the same chirality as the hydrophobic amino acid residue. In embodiments, at least two arginine residues are adjacent to each other. In still other embodiments, three arginine residues are adjacent to each other. In embodiments, at least two hydrophobic amino acid residues are adjacent to each other. In embodiments, at least three hydrophobic amino acid residues are adjacent to each other. In embodiments, the CPPs described herein comprise at least two consecutive hydrophobic amino acid residues and at least two consecutive arginine residues. In embodiments, the CPPs described herein comprise at least two hydrophobic amino acids and at least two arginine residues. In embodiments, one hydrophobic amino acid is adjacent to one of the arginines. In embodiments, the CPPs described herein comprise at least three consecutive hydrophobic amino acid residues and three consecutive arginine residues. In embodiments, one hydrophobic amino acid is adjacent to one of the arginines. These various combinations of amino acids can have any arrangement of D and L amino acids.

[0110] In embodiments, the CPPs described herein comprise at least two hydrophobic amino acid residues and at least two arginine residues, wherein the at least two hydrophobic residues are separated from each other by an intervening amino acid and the at least two arginine residues are separated by an intervening amino acid. In embodiments, the hydrophobic residues have the same chirality. In embodiments, the arginine residues have the same chirality.

33

SUBSTITUTE SHEET (RULE 26) In embodiments, the intervening amino acid does not have the same chirality as the arginine residues.

[OHl] In embodiments in which the CPP is conjugated to the nuclease, the CPP may comprise a residue of lysine, glutamine, glutamic acid, asparagine, aspartic acid, or an amino acid comprising an -SH group.

[0112] In embodiments, the CPP may be or include any of the sequences listed in Table D. In embodiments, the CPPs used in the modified loop nucleases and/or conjugated to the nucleases as disclosed herein may comprise any one of the sequences in Table D or comprise any one of the sequences listed in Table D, along with additional amino acids (e.g., lysine, glutamine, glutamic acid, asparagine, aspartic acid). In embodiments, the sequences in Table D may be cyclized by forming a peptide bond between the terminal amino acids. In embodiments, one or more amino acids may be added to the sequences below to mediate cyclization and/or conjugation to nucleases. Such amino acids include, but are not limited to amino acids include a thiol (-SH) group, such as cystine, or amino acids with functional groups that may be used to conjugate the CPP the nuclease (e.g., through a linker), such as lysine, glutamine, glutamic acid, asparagine, aspartic acid.

Table D. CPP sequences

34

SUBSTITUTE SHEET (RULE 26)

35

SUBSTITUTE SHEET (RULE 26)

36

SUBSTITUTE SHEET (RULE 26)

37

SUBSTITUTE SHEET (RULE 26) phosphothreonine; Pip, L-piperidine-2-carboxylic acid; Cha, L-3-cyclohexyl-alanine; Tm, trimesic acid; Dap, L-2, 3 -diaminopropionic acid; Sar, sarcosine; F2Pmp, L- difluorophosphonomethyl phenylalanine; Dod, dodecanoyl; Pra, L-propargylglycine; AzK, L- 6-Azido-2-amino-hexanoic; Agp, L-2-amino-3-guanidinylpropionic acid.

* each W may be independently replaced with phenylalanine (F or f) or tyrosine (Y or y).

[0113] As used herein cytosolic delivery efficiency refers to the ability of a construct described herein comprising a CPP to traverse a cell membrane and enter the cytosol. In embodiments, cytosolic delivery efficiency of the construct comprising the CPP is not dependent on a receptor or a cell type. Cytosolic delivery efficiency can refer to absolute cytosolic delivery efficiency or relative cytosolic delivery efficiency.

[0114] Absolute cytosolic delivery efficiency is the ratio of cytosolic concentration of a construct comprising a CPP over the concentration of the construct comprising the CPP in the growth medium. Relative cytosolic delivery efficiency refers to the concentration of a construct comprising a CPP in the cytosol compared to the concentration of a control construct not comprising a CPP in the cytosol. Quantification can be achieved by fluorescently labeling

38

SUBSTITUTE SHEET (RULE 26) the protein (e.g., with a FITC dye) and measuring the fluorescence intensity using techniques well-known in the art.

Looped Nucleases

[0115] In embodiments, the present disclosure provides modified looped nucleases comprising one or more loop region, wherein the at least one loop region comprises a CPP sequence inserted into the loop. The term “looped nucleases” refers to a nuclease with a secondary structure comprising one or more looped regions. Loops refer to regions of the protein other than alpha helices and beta-strands. Structurally, loops are generally located in regions where there is a change direction in the secondary structure. In embodiments, the change in direction can be at least 120 degrees. In embodiments, the change of direction is determined across 200 amino acids or less. Loops that have only 4 or 5 amino acid residues which participate in internal hydrogen bonding are referred to as “turns”. Protein loops include beta turns and omega loops. The most common types of loops and turns cause a change in direction of the polypeptide chain allowing it to fold back on itself to create a more compact structure. Looped regions in nucleases can be determined by means known in the art, such as queries of the Loops in Proteins database (See Michalesky and Preissner, Loops In Proteins (LIP) - a comprehensive loop database for homology modelling. Protein Engineering, Design, and Selection. (2003) 16: 12;979-985), and the online protein fold recognition server Phyre 2 (Kelley et al., The Phyre2 Web Portal For Protein Modeling, Prediction And Analysis. Nat. Protoc 2015, 10 (6), 845-858).

[0116] Looped regions in nucleases may be annotated within online databases, such as UniProt. For example, the secondary structure of Cas9 from Streptococcus pyogenes serotype Ml (Uniprot Accession Number Q99ZW2) (SEQ ID NO: 1) is annotated within the Structure section of Uniprot. In embodiments, Cas9 from Streptococcus pyogenes serotype Ml comprises a CPP sequence disclosed herein in one or more of Cas9’s loop regions, which are described in Table E. The amino acid ranges contained in Table E are numbered with respect to SEQ ID NO: 1.

Table E: Looped Regions of Cas9 from Streptococcus pyogenes serotype Ml

39

SUBSTITUTE SHEET (RULE 26)

[0117] In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 23-25 of Cas9 (SEQ ID NO: 1), for example, amino acid 23, 24, 25, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 103 - 105 of Cas9 (SEQ ID NO: 1), for example, amino acid 103, 104, 105, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 117 - 119 of Cas9 (SEQ ID NO: 1), for example, amino acid 117, 118, 119, or combinations thereof. In embodiments, a CPP replaces one or

40

SUBSTITUTE SHEET (RULE 26) more of, or is inserted between one or more of, amino acids 196 - 198 of Cas9 (SEQ ID NO: 1), for example, amino acid 196, 197, 198, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 253 - 257 of Cas9 (SEQ ID NO: 1), for example, amino acid 253, 254, 255, 256, 257, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 300 - 305 of Cas9 (SEQ ID NO: 1), for example, amino acid 300, 301, 302, 303, 304, 305, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 427 - 429 of Cas9 (SEQ ID NO: 1), for example, amino acid 427, 428, 429, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 450 - 452 of Cas9 (SEQ ID NO: 1), for example, amino acid 450, 451, 452, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 475 - 477 of Cas9 (SEQ ID NO: 1), for example, amino acid 475, 476, 477, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 532 - 534 of Cas9 (SEQ ID NO: 1), for example, amino acid 532, 533, 534, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 552 - 555 of Cas9 (SEQ ID NO: 1), for example, amino acid 552, 553, 554, 555, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 568 - 573 of Cas9 (SEQ ID NO: 1), for example, amino acid 568, 569, 570, 571, 572, 573, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 673 - 675 of Cas9 (SEQ ID NO: 1), for example, amino acid 673, 674, 675, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 687 - 689 of Cas9 (SEQ ID NO: 1), for example, amino acid 687, 688, 689, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 751 - 753 of Cas9 (SEQ ID NO: 1), for example, amino acid 751, 752, 753, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 771 - 774 of Cas9 (SEQ ID NO: 1), for example, amino acid 771, 772, 773, 774, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 817 - 819 of Cas9 (SEQ ID NO: 1), for example, amino acid 817, 818, 819, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 844 - 846 of Cas9 (SEQ ID NO: 1), for example, amino acid 844, 845, 846, or combinations thereof. In embodiments, a CPP replaces one or

41

SUBSTITUTE SHEET (RULE 26) more of, or is inserted between one or more of, amino acids 1053 - 1055 of Cas9 (SEQ ID NO: 1), for example, amino acid 1053, 1054, 1055, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 1067 - 1069 of Cas9 (SEQ ID NO: 1), for example, amino acid 1067, 1068, 1069, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 1076 - 1078 of Cas9 (SEQ ID NO: 1), for example, amino acid 1076, 1077, 1078, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 1152 - 1155 of Cas9 (SEQ ID NO: 1), for example, amino acid 1152, 1153, 1154, 1155, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 1168 - 1170 of Cas9 (SEQ ID NO: 1), for example, amino acid 1168, 1169, 1170, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 1262 - 1264 of Cas9 (SEQ ID NO: 1), for example, amino acid 1262, 1263, 1264, or combinations thereof. In embodiments, a CPP replaces one or more of, or is inserted between one or more of, amino acids 1297 - 1299 of Cas9 (SEQ ID NO: 1), for example, amino acid 1297, 1298, 1299, or combinations thereof.

[0118] In embodiments, a CPP is inserted immediately after an amino acid within the range 23 - 25 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 23, 24, or 25. In embodiments, a CPP is inserted immediately after an amino acid within the range 103 - 105 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 103, 104, or 105. In embodiments, a CPP is inserted immediately after an amino acid within the range 117 - 119 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 117, 118, or 119. In embodiments, a CPP is inserted immediately after an amino acid within the range 196 - 198 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 196, 197, or 198. In embodiments, a CPP is inserted immediately after an amino acid within the range 253 - 257 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 253, 254, 255, 256, or 257. In embodiments, a CPP is inserted immediately after an amino acid within the range 300 - 305 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 300, 301, 302, 303, 304, or 305. In embodiments, a CPP is inserted immediately after an amino acid within the range 427 - 429 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 427, 428, or 429. In embodiments, a CPP is inserted immediately after an amino acid within the range 450 - 452 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 450, 451, or 452. In embodiments, a CPP is inserted immediately after an amino acid 42

SUBSTITUTE SHEET (RULE 26) within the range 475 - 477 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 475, 476, or 477. In embodiments, a CPP is inserted immediately after an amino acid within the range 532 - 534 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 532, 533, or 534. In embodiments, a CPP is inserted immediately after an amino acid within the range 552 - 555 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 552, 553, or 554, 555. In embodiments, a CPP is inserted immediately after an amino acid within the range 568 - 573 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 568, 569, 570, 571, 572, or 573. In embodiments, a CPP is inserted immediately after an amino acid within the range 673 - 675 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 673, 674, or 675. In embodiments, a CPP is inserted immediately after an amino acid within the range 687 - 689 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 687, 688, or 689. In embodiments, a CPP is inserted immediately after an amino acid within the range 751 - 753 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 751, 752, or 753. In embodiments, a CPP is inserted immediately after an amino acid within the range 771 - 774 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 771, 772, 773, or 774. In embodiments, a CPP is inserted immediately after an amino acid within the range 817 - 819 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 817, 818, or 819. In embodiments, a CPP is inserted immediately after an amino acid within the range 844 - 846 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 844, 845, or 846. In embodiments, a CPP is inserted immediately after an amino acid within the range 1053 - 1055 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 1053, 1054, or 1055. In embodiments, a CPP is inserted immediately after an amino acid within the range 1067 - 1069 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 1067, 1068, or 1069. In embodiments, a CPP is inserted immediately after an amino acid within the range 1076 - 1078 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 1076, 1077, or 1078. In embodiments, a CPP is inserted immediately after an amino acid within the range 1152 - 1155 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 1152, 1153, 1154, or 1155. In embodiments, a CPP is inserted immediately after an amino acid within the range 1168 - 1170 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 1168, 1169, or 1170. In embodiments, a CPP is inserted immediately after an amino acid within the range 1262 - 1264 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 1262, 1263, or 1264. In embodiments, a CPP is inserted

43

SUBSTITUTE SHEET (RULE 26) immediately after an amino acid within the range 1297 - 1299 of Cas9 (SEQ ID NO: 1), for example, immediately after amino acid 1297, 1298, or 1299.

[0119] In embodiments, a CPP is inserted immediately before an amino acid within the range 23 - 25 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 23, 24, or 25. In embodiments, a CPP is inserted immediately before an amino acid within the range 103 - 105 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 103, 104, or 105. In embodiments, a CPP is inserted immediately before an amino acid within the range 117 - 119 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 117, 118, or 119. In embodiments, a CPP is inserted immediately before an amino acid within the range 196 - 198 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 196, 197, or 198. In embodiments, a CPP is inserted immediately before an amino acid within the range 253 - 257 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 253, 254, 255, 256, or 257. In embodiments, a CPP is inserted immediately before an amino acid within the range 300 - 305 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 300, 301, 302, 303, 304, or 305. In embodiments, a CPP is inserted immediately before an amino acid within the range 427 - 429 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 427, 428, or 429. In embodiments, a CPP is inserted immediately before an amino acid within the range 450 - 452 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 450, 451, or 452. In embodiments, a CPP is inserted immediately before an amino acid within the range 475 - 477 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 475, 476, or 477. In embodiments, a CPP is inserted immediately before an amino acid within the range 532 - 534 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 532, 533, or 534. In embodiments, a CPP is inserted immediately before an amino acid within the range 552 - 555 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 552, 553, or 554, 555. In embodiments, a CPP is inserted immediately before an amino acid within the range 568 - 573 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 568, 569, 570, 571, 572, or 573. In embodiments, a CPP is inserted immediately before an amino acid within the range 673 - 675 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 673, 674, or 675. In embodiments, a CPP is inserted immediately before an amino acid within the range 687 - 689 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 687, 688, or 689. In embodiments, a CPP is inserted immediately before an amino acid within the range 751 - 753 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 751, 752, or 753. In embodiments, a CPP is inserted 44

SUBSTITUTE SHEET (RULE 26) immediately before an amino acid within the range 771 - 774 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 771, 772, 773, or 774. In embodiments, a CPP is inserted immediately before an amino acid within the range 817 - 819 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 817, 818, or 819. In embodiments, a CPP is inserted immediately before an amino acid within the range 844 - 846 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 844, 845, or 846. In embodiments, a CPP is inserted immediately before an amino acid within the range 1053 - 1055 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 1053, 1054, or 1055. In embodiments, a CPP is inserted immediately before an amino acid within the range 1067 - 1069 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 1067, 1068, or 1069. In embodiments, a CPP is inserted immediately before an amino acid within the range 1076 - 1078 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 1076, 1077, or 1078. In embodiments, a CPP is inserted immediately before an amino acid within the range 1152 - 1155 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 1152, 1153, 1154, or 1155. In embodiments, a CPP is inserted immediately before an amino acid within the range 1168 - 1170 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 1168, 1169, or 1170. In embodiments, a CPP is inserted immediately before an amino acid within the range 1262 - 1264 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 1262, 1263, or 1264. In embodiments, a CPP is inserted immediately before an amino acid within the range 1297 - 1299 of Cas9 (SEQ ID NO: 1), for example, immediately before amino acid 1297, 1298, or 1299.

[0120] In the modified looped proteins described herein, CPP motifs are fused into the loop regions of nucleases, rather than at the N- or C-terminus. In embodiments, insertion of a short CPP peptide into a surface loop or replacement of the original loop sequence with a CPP is expected to constrain the CPP sequence into a “cyclic” like conformation, which is expected to enhance the proteolytic stability of the CPP sequence. In embodiments, the “cyclic” like conformation of a loop-embedded CPP may mimic that of a cyclic CPP and potentially enhance its cellular entry efficiency (cyclic CPPs have greater cytosolic uptake efficiency compared to linear CPPs). Previous studies have shown that insertion of proper peptide sequences into surface loops of a protein often causes only minor destabilization of the protein structure (Scalley-Kim et al. Protein Science 2003, 12, 197-206).

45

SUBSTITUTE SHEET (RULE 26) [0121] Another important consideration is the CPP sequence. CPPs are thought to escape the endosome by binding to the intraluminal membrane and inducing CPP-enriched lipid domains to bud off the endosomal membrane as tiny vesicles, which then disintegrate into amorphous lipid/CPP aggregates inside the cytoplasm (Qian et al., Biochemistry 2016, 55, 2601-2612). Amphipathic CPPs likely facilitate endosomal escape by stabilizing the budding neck structure, which features simultaneous positive and negative membrane curvatures in orthogonal directions (or negative Gaussian curvature), as the hydrophobic group(s) can insert into the membrane to generate positive curvature, while the arginine residues bring the phospholipid head groups to-gether to induce negative curvature (Dougherty et al., Understanding Cell Penetration of Cyclic Peptides. Chem. Rev. 2019, 119, 10241-10287). In addition, the most active cyclic CPPs (e.g., cyclo(Phe-phe-Nal-Arg-arg-Arg-arg-Gln) (SEQ ID NO: 126), where phe is D-phenylalanine, Nal is L-naphthylalanine (Nal), and arg is D- arginine) contain D- as well as L-amino acids at roughly alternating positions. See Qian et al., Biochemistry 2016, 55, 2601-2612. It is hypothesized that the specific spatial arrangement of the hydrophobic and positively charged side chains in a cyclic conformation may facilitate the formation of negative Gaussian curvature at the budding neck, which is an obligatory intermediate of any budding event.

[0122] In embodiments, the modified looped nucleases described herein further comprise a detectable tag. Examples of detectable tags include but are not limited to, FLAG tags, poly-histidine tags (e.g. 6xHis (SEQ ID NO: 127)), SNAP tags, Halo tags, cMyc tags, glutathione-S-transferase tags, avidin, enzymes, fluorescent proteins, luminescent proteins, chemiluminescent proteins, bioluminescent proteins, and phosphorescent proteins. In embodiments the fluorescent protein is selected from blue/UV proteins (such as BFP, TagBFP, mTagBFP2, Azurite, EBFP2, mKalamal, Sirius, Sapphire, and T-Sapphire); cyan proteins (such as CFP, eCFP, Cerulean, SCFP3A, mTurquoise, mTurquoise2, monomeric Midoriishi- Cyan, TagCFP, and mTFPl); green proteins (such as: GFP, eGFP, meGFP (A208K mutation), Emerald, Superfolder GFP, Monomeric Azami Green, TagGFP2, mUKG, mWasabi, Clover, and mNeonGreen); yellow proteins (such as YFP, eYFP, Citrine, Venus, SYFP2, and TagYFP); orange proteins (such as Monomeric Kusabira-Orange, mKOx, mK02, mOrange, and mOrange2); red proteins (such as RFP, mRaspberry, mCherry, mStrawberry, mTangerine, tdTomato, TagRFP, TagRFP-T, mApple, mRuby, and mRuby2); far-red proteins (such as mPlum, HcRed-Tandem, mKate2, mNeptune, and NirFP); near-infrared proteins (such as TagRFP657, IFP1.4, and iRFP); long stokes shift proteins (such as mKeima Red, LSS-mKatel, 46

SUBSTITUTE SHEET (RULE 26) LSS-mKate2, and mBeRFP); photoactivatible proteins (such as PA-GFP, PAmCherryl, and PATagRFP); photoconvertible proteins (such as Kaede (green), Kaede (red), KikGRl (green), KikGRl (red), PS-CFP2, PS-CFP2, mEos2 (green), mEos2 (red), mEos3.2 (green), mEos3.2 (red), PSmOrange, and PSmOrange); and photoswitchable proteins (such as Dronpa). In embodiments, the detectable tag can be selected from AmCyan, AsRed, DsRed2, DsRed Express, E2-Crimson, HcRed, ZsGreen, Zs Yellow, mCherry, mStrawberry, mOrange, mBanana, mPlum, mRasberry, tdTomato, DsRed Monomer, and/or AcGFP, all of which are available from Clontech.

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) Gene-Editing Machinery

[0123] In embodiments, the compounds disclosed herein comprise one or more CPP (or cCPP) conjugated to one or more components of a CRISPR-Cas gene-editing system. In embodiments, the compounds disclosed herein comprise one or more CPP (or cCPP) conjugated to an expression vector encoding one or more components of a CRISPR-Cas geneediting system. In some embodiments, a linker conjugates the CPP (or cCPP) to the component of the CRISPR-Cas gene-editing system. Any linker described in this disclosure or that is known to a person of skill in the art may be utilized.

[0124] In embodiments, a compound comprising a CPP conjugated a nuclease or fragment thereof is provided. In embodiments, a compound comprising a CPP conjugated to a DNA sequence (e.g. an expression vector, for example, an expression vector encoding a nuclease, an RNA, or both) is provided. In embodiments, a compound comprising a CPP conjugated to a RNA sequence (e.g. a gRNA) is provided. In embodiments, a CPP is conjugated to multiple cargos. In embodiments, a CNP is conjugated to RNA and DNA cargos. In embodiments, a CPP is conjugated to RNA and polypeptide cargos. In embodiments, a CPP is conjugated to DNA and polypeptide cargos.

Nucleases

[0125] In embodiments, the construct described herein comprises one or more nucleases. In embodiments, one or more CPPs are conjugated to one or more nucleases. In embodiments, the construct described herein comprises an expression vector encoding one or more nucleases. In embodiments, one or more CPPs are conjugated to an expression vector encoding one or more nucleases.

47

SUBSTITUTE SHEET (RULE 26) [0126] The term “nuclease,” as used herein, refers to a protein that cleaves a phosphodiester bond connecting nucleotide residues in a nucleic acid molecule.

[0127] In embodiments, the nuclease is an endonuclease. An endonuclease cleaves a phosphodiester bond within a polynucleotide chain. In embodiments, the endonuclease cut a double-stranded nucleic acid target site symmetrically, i.e., cutting both strands at the same position so that the ends comprise base-paired nucleotides, also referred to herein as blunt ends. In embodiments, the endonuclease cuts a double-stranded nucleic acid target sites asymmetrically, i.e., cutting each strand at a different position so that the ends comprise unpaired nucleotides. Unpaired nucleotides at the end of a double-stranded DNA molecule are also referred to as “overhangs,” e.g., as “5 '-overhang” or as “3 '-overhang,” depending on whether the unpaired nucleotide(s) form(s) the 5' or the 3' end of the respective DNA strand. Double-stranded DNA molecule ends ending with unpaired nucleotide(s) are also referred to as sticky ends, as they can “stick to” other double-stranded DNA molecule ends comprising complementary unpaired nucleotide(s). In embodiments, the endonuclease is a DNA restriction nuclease. The target sites of DNA restriction nucleases, are well known to those of skill in the art. In embodiments, a restriction nuclease, such as EcoRI, Hindlll, or BamHI, recognizes a palindromic, double-stranded DNA target site of 4 to 10 base pairs in length, and cuts each of the two DNA strands at a specific position within the target site.

[0128] In embodiments, the nuclease is an exonuclease. An exonuclease cleaves a phosphodiester bond at the end of the polynucleotide chain.

[0129] In embodiments, a nuclease is a site-specific nuclease, binding and/or cleaving a specific phosphodiester bond within a specific nucleotide sequence, which is also referred to herein as the “recognition sequence,” the “nuclease target site,” or the “target site.”

[0130] In embodiments, a nuclease is a deoxyribonuclease (referred to DNase or DNA nuclease). A DNA nuclase catalyzes the hydrolytic cleavage of phosphodiester bonds in the DNA backbone. In embodiments, the DNA nuclease is deoxyribonuclease I, deoxyribonuclease II, or microccal nuclease. In embodiments, the DNA nuclease is an endonuclease. In embodiments, the DNA nuclease is a Type I nuclease (e.g. restriction enzyme that cleaves away from the recognition site). In embodiments, the DNA nuclease is a Type II nuclease (e.g. restriction enzyme that cleaves within or close to the recognition site). In embodiments, the nuclease is a Type V nuclease.

48

SUBSTITUTE SHEET (RULE 26) [0131] In embodiments, a nuclease is a ribonuclease (referred to RNase or RNA nuclease). A RNA nuclease catalyzes the hydrolytic cleavage of phosphodiester bonds in the RNA backbone. In embodiments, the RNA nuclease is RNaseA, RNaseH, RNase III, RNase L, RNaseP, RNase PhyM, RNase Tl, RNase T2, RNase U2, RNase V, RNaseE, PNPase, RNase PH, RNase R, RNase D, RNase T, oligoribonuclease, exoribonuclease I, exoribonuclease II, or RNaseG. In embodiments, the RNA nuclease is a Type VI nuclease.

[0132] In embodiments, the nuclease is a DNA and RNA nuclease. In embodiments, the DNA and RNA nuclease is a Type III nuclease.

[0133] In embodiments, the nuclease is a Type II, Type V-A, Type V-B, Type VC, Type V-U, Type VI-B nuclease.

[0134] In embodiments, a nuclease recognizes a single stranded target site. In embodiments, a nuclease recognizes a double-stranded target site, for example a doublestranded DNA target site.

[0135] In embodiments, a nuclease comprises a “binding domain” that mediates the interaction of the protein with the nucleic acid substrate. In embodiments, the nuclease specifically binds to a target site. In embodiments, a nuclease comprises a “cleavage domain” that catalyzes the cleavage of the phosphodiester bond within the nucleic acid backbone.

[0136] In embodiments a nuclease binds and cleave a nucleic acid molecule in a monomeric form. In embodiments, a nuclease protein has to dimerize or multimerize in order to cleave a target nucleic acid molecule. Binding domains and cleavage domains of naturally occurring nucleases, as well as modular binding domains and cleavage domains that can be fused to create nucleases binding specific target sites, are well known to those of skill in the art. For example, zinc fingers or transcriptional activator like elements can be used as binding domains to specifically bind a desired target site, and fused or conjugated to a cleavage domain, for example, the cleavage domain of FokI, to create an engineered nuclease cleaving the target site.

[0137] In embodiments, the nuclease is part of a CRISPR-Cas system. CRISPR-Cas systems fall into two classes: Class 1 systems use a complex of multiple Cas proteins to degrade foreign nucleic acids. In contrast, Class 2 systems use a single large Cas protein for the same purpose. In embodiments, the CRISPR-Cas system is a Class 1 system. In embodiments, the Class 1 system is a Type I system. In embodiments, the Class 1, Type I system is a I- A, I-B, I-

49

SUBSTITUTE SHEET (RULE 26) C, I-D, I-E, I-F, or I-G subtype. In embodiments, the Class 1, Type I system incorporates Cas3, Cas8a, Cas5, Cas8b, Cas8c, CaslOd, Csel, Cse2, Csyl, Csy2, Csy3, and/or GSU0054 or a variant thereof. In embodiments, the Class 1 system is a Type III system. In embodiments, the Class 1, Type III system is a III- A, III-B, III-C, III-D, III-E, or III-F subtype. In embodiments, the Class 1, Type III system incorporates CaslO, Csm2, Cmr5, CaslO, Csxl 1, and/or CsxlO or a variant thereof. In embodiments, the Class 1 system is a Type IV system. In embodiments, the Class 1, Type IV system is a IV-A, IV-B, or IV-C subtype. In embodiments, the Class 1, Type IV system incorporate Csfl or a variant thereof.

[0138] In embodiments, the CRISPR-Cas system is a Class 2 system. In embodiments, the Class 2 system is a Type II system. In embodiments, the Class 2, Type II system is a II-A, II-B, or II-C subtype. In embodiments, the Class 2, Type II system incorporates Cas9, Csn2, and/or Cas4 or a variant thereof. In embodiments, the Class 2 system is a Type V system. In embodiments, the Class 2, Type V system is a V-A, V-B, V-C, V-D, V-E, V-F, V-G, V-H, V-

1, V-K, or V-U subtype. In embodiments, the Class 2, Type V system incorporates Casl2, Casl2a (Cpfl), Casl2b (C2cl), Casl2c (C2c3), Casl2d (CasY), Casl2e (CasX), Casl2f (Cast 4, C2cl0), Cast 2g, Casl2h, Casl2i, Cast 2k (C2c5), C2c4, C2c8, and/or C2c9 or a variant thereof. In embodiments, the Class 2 system is a Type VI system. In embodiments, the Class 2, Type VI system is a VI-A, VI-B, VI-C, or VI-D subtype. In embodiments, the Class

2, Type VI system incorporates Cast 3, Cast 3a (C2c2), Cast 3b, Cast 3 c, and/or Cast 3d or a variant thereof.

[0139] In embodiments, the CRISPR-Cas system and components are those taught in US 8,697,359 the entire contents of which are incorporated herein by reference.

[0140] In embodiments, the CRISPR-Cas system employs a nuclease that can cleave a single stranded nucleotide sequence. In embodiments, the CRISPR-Cas system employs a nuclease that can cleave RNA, mRNA, and/or pre-mRNA.

[0141] In embodiments, the nuclease is a transcription, activator-like effector nuclease (TALEN), a meganuclease, or a zinc-finger nuclease. In embodiments, the nuclease is a Cas9, Cas9 variant, Cast 2a (Cpfl), Cast 2b, Cast 2c, Tnp-B like, Cast 3 a (C2c2), Cast 3b, or Cast 4 nuclease. In embodiments, the nuclease is a Cas9 nuclease or a Cpfl nuclease.

[0142] In embodiments, the nuclease is a TALEN. The term “Transcriptional Activator-Like Element Nuclease,” (TALEN) as used herein, refers to an artificial nuclease comprising a transcriptional activator like effector DNA binding domain and

50

SUBSTITUTE SHEET (RULE 26) a DNA cleavage domain, for example, a FokI domain. A number of modular assembly schemes for generating engineered TALE constructs have been reported (see e.g., Zhang, Feng; et. al. (February 2011). “Efficient construction of sequence-specific TAL effectors for modulating mammalian transcription”. Nature Biotechnology 29 (2): 149-53; Geipier, R.; Scholze, H.; Hahn, S.; Streubel, J.; Bonas, LT.; Behrens, S. E.; Boch, J. (2011), Shiu, Shin-Han. ed. “Transcriptional Activators of Human Genes with Programmable DNA-Specificity”. PLoS ONE 6 (5): el9509; Cermak, T.; Doyle, E. L.; Christian, M.; Wang, L.; Zhang, Y.; Schmidt, C.; Bailer, J. A.; Somia, N. V. et al. (2011). “Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting”. Nucleic Acids Research,' Morbitzer, R.; Elsaesser, J.; Hausner, J.; Lahaye, T. (2011). “Assembly of custom TALE-type DNA binding domains by modular cloning”. Nucleic Acids Research,' Li, T.; Huang, S.; Zhao, X.; Wright, D. A.; Carpenter, S.; Spalding, M. H.; Weeks, D. P.; Yang, B. (2011). “Modularly assembled designer TAL effector nucleases for targeted gene knockout and gene replacement in eukaryotes”. Nucleic Acids Research,' Weber, E.; Gruetzner, R.; Werner, S.; Engler, C.; Marillonnet, S. (2011). Bendahmane, Mohammed, ed. “Assembly of Designer TAL Effectors by Golden Gate Cloning”. PLoS ONE 6 (5): el 9722; the entire contents of each of which are incorporated herein by reference).

[0143] In embodiments, the nuclease is a zinc finger nuclease. The term “zinc finger nuclease,” as used herein, refers to a nuclease comprising a nucleic acid cleavage domain conjugated to a binding domain that comprises a zinc finger array. The term “zinc finger,” as used herein, refers to a small nucleic acid-binding protein structural motif characterized by a fold and the coordination of one or more zinc ions that stabilize the fold. Zinc fingers encompass a wide variety of differing protein structures (see, e.g., Klug A, Rhodes D (1987). “Zinc fingers: a novel protein fold for nucleic acid recognition”. Cold Spring Harb. Symp. Quant. Biol. 52: 473-82, the entire contents of which are incorporated herein by reference). Zinc fingers can be designed to bind a specific sequence of nucleotides, and zinc finger arrays comprising fusions of a series of zinc fingers, can be designed to bind virtually any desired target sequence. Such zinc finger arrays can form a binding domain of a protein, for example, of a nuclease, e.g., if conjugated to a nucleic acid cleavage domain. Different type of zinc finger motifs are known to those of skill in the art, including, but not limited to, Cys2His2, Gag knuckle, Treble clef, Zinc ribbon, Zm/Cyse, and TAZ2 domain-like motifs (see, e.g., Krishna S S, Majumdar I, Grishin N V (January 2003). “Structural classification of zinc fingers: survey and summary”. Nucleic Acids Res. 31 (2): 532-50). In embodiments, the zinc 51

SUBSTITUTE SHEET (RULE 26) finger array comprises one or more different zinc finger motifs selected from Cys2His2, Gag knuckle, Treble clef, Zinc ribbon, Zm/Cyse, and TAZ2 domain-like motifs.

[0144] In embodiments, a single zinc finger motif binds 3 or 4 nucleotides of a nucleic acid molecule. In embodiments, a zinc finger domain comprising 2 zinc finger motifs binds 6- 8 nucleotides. In embodiments, a zinc finger domain comprising 3 zinc finger motifs binds 9- 12 nucleotides. Any suitable protein engineering technique can be employed to alter the DNA- binding specificity of zinc fingers and/or design novel zinc finger fusions to bind virtually any desired target sequence from 3-30 nucleotides in length (see, e.g., Pabo C O, Peisach E, Grant R A (2001). “Design and selection of novel cys2His2 Zinc finger proteins”. Annual Review of Biochemistry 70: 313-340; Jamieson A C, Miller J C, Pabo C O (2003). “Drug discovery with engineered zinc-finger proteins”. Nature Reviews Drug Discovery 2 (5): 361-368; and Liu Q, Segal D J, Ghiara J B, Barbas C F (May 1997). “Design of poly dactyl zinc-finger proteins for unique addressing within complex genomes”. Proc. Natl. Acad. Sci. U.S.A. 94 (11); the entire contents of each of which are incorporated herein by reference). Fusions between engineered zinc finger arrays and protein domains that cleave a nucleic acid can be used to generate a “zinc finger nuclease.”

[0145] In embodiments, the nuclease is a modified form or variant of a Cas9, Casl2a (Cpfl), Casl2b, Casl2c, Tnp-B like, Casl3a (C2c2), Casl3b, or Casl4 nuclease. In embodiments, the nuclease is a modified form or variant of a TAL nuclease, a meganuclease, or a zinc-finger nuclease. A “modified” or “variant” nuclease is one that is, for example, truncated, fused to another protein (such as another nuclease), catalytically inactivated, etc. In embodiments, the nuclease may have at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to a naturally occurring Cas9, Casl2a (Cpfl), Casl2b, Casl2c, Tnp-B like, Casl3a (C2c2), Casl3b, Casl4 nuclease, or a TALEN, meganuclease, or zinc-finger nuclease. In embodiments, the nuclease is a Cas9 nuclease variant. In embodiments, the nuclease is a naturally-occurring Cas9 variant. In embodiments, the nuclease is an engineered Cas9 variant. In embodiments, the nuclease is an engineered Cas9 variant in which one or more amino acid residues are replaced with a cysteine. In embodiments, the nuclease is a high fidelity Cas9 variant. In embodiments, the nuclease is a Cas9 nuclease derived from S. pyogenes (SpCas9; SEQ ID NO: 1)). In embodiments, a nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to a Cas9 nuclease derived from S. pyogenes (SpCas9;

52

SUBSTITUTE SHEET (RULE 26) SEQ ID NO: 1). In embodiments, the nuclease is SpCas9-HFl. In embodiments, the nuclease is an enhanced SpCas9 (eSpCas9). In embodiments, the nuclease is a Cas9 derived from S. aureus (SaCas9; SEQ ID NO: 133). In embodiments, the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to a Cas9 derived from S. aureus (SaCas9; SEQ ID NO: 133). In embodiments, the nuclease is a SaCas9- HF nuclease. In embodiments, the nuclease is a KKHSaCas9 nuclease. In embodiments, the nuclease is a catalytically inactive Cas9. In embodiments, the nuclease is dCas9 (SEQ ID NO: 132). In embodiments, the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to dCas9 (SEQ ID NO: 132). In embodiments, the Cas9 nuclease is from S. thermophilus (stCas9; SEQ ID NO: 134). In embodiments, the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a Cas9 derived from S. thermophilus (stCas9; SEQ ID NO: 134). In embodiments, the nuclease is derived from N. meningitidis (nmCas9; SEQ ID NO: 135). In embodiments, the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a Cas9 derived from N. meningitidis (nmCas9; SEQ ID NO: 135). In embodiments, the nuclease is derived from F. novicida (fnCas9; SEQ ID NO: 136). In embodiments, the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a Cas9 derived from F novicida (fnCas9; SEQ ID NO: 136). In embodiments, the nuclease is derived from C. jejuni (cjCas9; SEQ ID NO: 137). In embodiments, the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a Cas9 derived from C. jejuni (cjCas9; SEQ ID NO: 137). In embodiments, the nuclease is derived from S. canis (scCas9; SEQ ID NO: 138). In embodiments, the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a Cas9 derived from S. canis (scCas9; SEQ ID NO: 138). In embodiments, the nuclease is derived from S. auricularis (SauriCas9; SEQ ID NO: 139). In embodiments, the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a Cas9 derived from S. auricularis (SauriCas9; SEQ ID NO: 139). In embodiments, the Cas9 variant binds to a protospacer adjacent motif (PAM) including, but not limited to, 5-NGG-3, 3-NNGRRT-5, 5-NNGRRT-3, 5-NNG-3, or 5-NNGG-3.

[0146] In embodiments, the Cpfl is a Cpfl enzyme from Acidaminococcus (species BV3L6, UniProt Accession No. U2UMQ6). In embodiments, the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to 53

SUBSTITUTE SHEET (RULE 26) a Cpfl enzyme from Acidaminococcus (species BV3L6, UniProt Accession No. U2UMQ6). In embodiments, the Cpfl is a Cpfl enzyme from Lachnospiraceae (species ND2006, UniProt Accession No. A0A182DWE3). In embodiments, the nuclease has at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to a Cpfl enzyme from Lachnospiraceae . In embodiments, a sequence encoding the nuclease is codon optimized for expression in mammalian cells. In embodiments, the sequence encoding the nuclease is codon optimized for expression in human cells or mouse cells.

[0147] In embodiments, the nuclease is selected from Table F. In embodiments, a nuclease has at least 95 %, at least 96 %, at least 97 %, at least 98 %, at least 99 % or 100% sequence identity to a nuclease from Table F. In embodiments, the nuclease recognizes a protospacer adjacent motif (PAM). In embodiments, the PAM is selected from Table F.

Table F: Exemplary Nucleases

Guide RNA (gRNA)

[0148] In embodiments, the construct described herein comprises guide RNA (also referred to a gRNA). In embodiments, one or more CPPs are conjugated to one or more guide RNA. In embodiments, the construct described herein comprises an expression vector encoding one or more gRNA. In embodiments, one or more CPPs are conjugated to an expression vector encoding one or more gRNA.

[0149] In embodiments, the gRNA is a single-molecule guide RNA (sgRNA). A sgRNA comprises a spacer sequence and a scaffold sequence. A spacer sequence is a short nucleic acid sequence used to target a nuclease (e.g., a Cas9 nuclease) to a specific nucleotide region of interest (e.g., a genomic DNA sequence to be cleaved). In embodiments, the spacer may be about 17-24 nucleotides in length, such as about 20 nucleotides in length. In 54

SUBSTITUTE SHEET (RULE 26) embodiments, the spacer may be about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, or about 30 nucleotides in length. In embodiments, the spacer may be at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, or at least 30 nucleotides in length. In embodiments, the spacer may be 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in length. In embodiments, the spacer sequence has from about 40% to about 80% GC content.

[0150] In embodiments, the spacer targets a site that immediately precedes a 5’ protospacer adjacent motif (PAM). The PAM sequence may be selected based on the desired nuclease. For example, the PAM sequence may be any one of the PAM sequences shown in Table E, wherein N refers to any nucleic acid, R refers to A or G, Y refers to C or T, W refers to A or T, and V refers to A or C or G. In embodiments, a spacer may target a sequence of a mammalian gene, such as a human gene. In embodiments, the spacer may target a mutant gene. In embodiments, the spacer may target a coding sequence. In embodiments, the spacer may target an exonic sequence.

[0151] The scaffold sequence is the sequence within the sgRNA that is responsible for nuclease (e.g., Cas9) binding. The scaffold sequence does not include the spacer/targeting sequence. In embodiments, the scaffold may be from about 1 to about 130 nucleotides in length, about 1 to about 10, about 10 to about 20, about 20 to about 30, about 30 to about 40, about 40 to about 50, about 50 to about 60, about 60 to about 70, about 70 to about 80, about 80 to about 90, about 90 to about 100, about 100 to about 110, about 110 to about 120, or about 120 to about 130 nucleotides in length. In embodiments, the scaffold may be about 1, about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31, about 32, about 33, about 34, about 35, about 36, about 37, about 38, about 39, about 40, about 41, about 42, about 43, about 44, about 45, about 46, about 47, about 48, about 49, about 50, about 51, about 52, about 53, about 54, about 55, about 56, about 57, about 58, about 59, about 60, about 60, about 61, about 62, about 63, about 64, about 65, about 66, about 67, about 68, about 69, about 70, about 71, about 72, about 73, about 74, about 75, about 76, about 77, about 78, about 79, about 80, about 81, about 82, about 83, about 84, about 85, about 86, about 87, about 88,

55

SUBSTITUTE SHEET (RULE 26) about 89, about 90, about 91, about 92, about 93, about 94, about 95, about 96, about 97, about 98, about 99, about 100, about 101, about 102, about 103, about 104, about 105, about 106, about 107, about 108, about 109, about 110, about 111, about 112, about 113, about 114, about 115, about 116, about 117, about 118, about 119, about 120, about 121, about 122, about 123, about 124, or about 125 nucleotides in length. In embodiments, the scaffold may be at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, or at least 125 nucleotides in length. In embodiments, the scaffold may be up to 100, up to 110, up to 120 or up to 130 nucleotides in length.

[0152] In embodiments, the gRNA is a dual-molecule guide RNA, e.g, crRNA and tracrRNA. In embodiments, the gRNA may further comprise a polyA tail.

[0153] In embodiments, a compound comprising a CPP conjugated to a nucleic acid comprising a gRNA is provided. In embodiments, a compound comprising a CPP conjugated to an expression vector encoding a gRNA is provided. In embodiments, the nucleic acid comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 gRNAs. In embodiments, the expression vector encodes about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 gRNAs. In embodiments, the gRNAs recognize the same target. In embodiments, the gRNAs recognize different targets. In embodiments, the expression vector comprising a gRNA comprises a sequence encoding a promoter, wherein the promoter drives expression of the gRNA.

Ribonucleoprotein complex (RNP)

[0154] In embodiments, the construct described herein comprises one or more ribonucleoprotein complexes (RNPs). In embodiments, one or more CPPs are conjugated to one or more RNPs.

Exocyclic Peptide

[0155] In embodiments, the construct comprising at least one component of a CRISPR-

Cas gene editing system and CPP further comprises an exocyclic peptide (EP). The EP may be referred to interchangeably as a modulatory peptide (MP). In embodiments, the EP can include a peptide that has been identified in the art as a “nuclear localization sequence” (NLS). In embodiments, the EP is coupled to the component of the CRISPR-Cas gene editing system. In embodiments, the EP is coupled to the CPP. In embodiments, the EP is coupled to the component of the CRISPR-Cas gene editing system and the CPP. Coupling between the EP, 56

SUBSTITUTE SHEET (RULE 26) the component of the CRISPR-Cas gene editing system, CPP, or combinations thereof, may be non-covalent or covalent. In embodiments, the EP is attached through a peptide bond to the N- terminus of the CPP. In embodiments, the EP is attached through a peptide bond to the C- terminus of the CPP. In embodiments, the EP is attached to the CPP through a side chain of an amino acid in the CPP. In embodiments, the EP is attached to the CPP through a side chain of a lysine which is conjugated to the side chain of a glutamine or glutamic acid in the CPP. In embodiments, the EP is attached through a peptide bond to the N-terminus of the nuclease. In embodiments, the EP is attached through a peptide bond to the C-terminus of the nuclease. In embodiments, the EP is attached to the CPP through a side chain of an amino acid in the nuclease. In embodiments, the EP is attached to the CPP through a side chain of a lysine which is conjugated to the side chain of a glutamine or glutamic acid in the nuclease. In embodiments, the EP is conjugated to the 5’ or 3’ end of a guide RNA sequence. In embodiments, the EP is coupled to a linker. In embodiments, the EP is coupled to a linker via the C-terminus of an EP and a CPP through a side chain on the CPP and/or EP. For example, an EP may comprise a terminal lysine which is then coupled to a CPP containing a glutamine through an amide bond. When the EP contains a terminal lysine, and the side chain of the lysine is used to attach the CPP, the C- or N-terminus may be attached to the linker on the component of the CRISPR-Cas gene editing system.

[0156] In embodiments, the EP comprises at least one positively charged amino acid residues, e.g., at least one lysine residue and/or at least one arginine residue. In one embodiment, the EP comprises at least two, at least three or at least four or more lysine residues and/or arginine residues.

[0157] In embodiments, the EP is selected from KK, KR, RR, KKK, KGK, KBK, KBR, KRK, KRR, RKK, RRR, KKKK, KKRK, KRKK, KRRK, RKKR, RRRR, KGKK, KKGK, KKKKK, KKKRK, KBKBK, KKKRKV, PKKKRKV, PGKKRKV, PKGKRKV, PKKGRKV, PKKKGKV, PKKKRGV and PKKKRKG.

[0158] In embodiments, the EP is selected from KK, KR, RR, KKK, KGK, KBK, KBR, KRK, KRR, RKK, RRR, KKKK, KKRK, KRKK, KRRK, RKKR, RRRR, KGKK, KKGK, KKKKK, KKKRK, KBKBK, KKKRKV, PGKKRKV, PKGKRKV, PKKGRKV, PKKKGKV, PKKKRGV and PKKKRKG.

[0159] In embodiments, the EP comprises an amino acid sequence identified in the art as a nuclear localization sequence (NLS). In an embodiment, the EP comprises an NLS 57

SUBSTITUTE SHEET (RULE 26) comprising the amino acid sequence PKKKRKV. In embodiments, the EP comprises an NLS comprising an amino acid sequence selected from NLSKRPAAIKKAGQAKKKK, PAAKRVKLD, RQRRNELKRSF, RMRKFKNKGKDTAELRRRRVEVSVELR, KAKKDEQILKRRNV, VSRKRPRP, PPKKARED, PQPKKKPL, SALIKKKKKMAP, DRLRR, PKQKKRK, RKLKKKIKKL, REKKKFLKRR, KRKGDEVDGVDEVAKKKSKK and RKCLQAGMNLEARKTKK.

Linker

[0160] In embodiments, the compounds further comprise a linker (L), which conjugates the CPP to the nuclease, guide sequence (e.g., gRNA), or ribonucleoprotein complex (RNP). In embodiments, a linker (L) conjugates the CPP to the 5' or the 3' end of the guide sequence (e.g., gRNA).

[0161] L may be any appropriate moiety which conjugates the CPP (e.g., as described herein) to a component of the CRISPR-Cas gene editing system. In embodiments, prior to conjugation to the CPP and a component of the CRISPR-Cas gene editing system, the linker has two or more functional groups, each of which are independently capable of forming a covalent bond to the CPP moiety and the component of the CRISPR-Cas gene editing system. In embodiments, the CPP is covalently bound to the N-terminus of the nuclease. In embodiments, the CPP is covalently bound to the C-terminus of the nuclease. In embodiments, the CPP is covalently bound to a side chain of an amino acid in the nuclease. In various embodiments, a linker is covalently bound to the 5' end of the guide RNA or the 3' end of the guide RNA. In embodiments, a linker is covalently bound to the 5' end of the guide RNA. In other embodiments, L is covalently bound to the 3' end of the guide RNA. In still other embodiments, a linker is covalently bound to the backbone of the guide RNA.

[0162] L may be any appropriate moiety which conjugates CPP (e.g., as described herein) to a component of the CRISPR-Cas gene editing system. In embodiments, prior to conjugation to the CPP and the component of the CRISPR-Cas gene editing system, the linker has two or more functional groups, each of which are independently capable of forming a covalent bond to the CPP moiety, the nuclease, or the guide RNA sequence. In embodiments, a linker is covalently bound to a nucleophilic moiety on the nuclease or guide RNA sequence. In embodiments, the nucleophilic moiety is conjugated to the nuclease or guide RNA sequence so that the nuclease can be attached to the CPP through a linker. In embodiments, a linker is covalently bound to a side chain or terminus of an amino acid on the CPP. In certain 58

SUBSTITUTE SHEET (RULE 26) embodiments, a linker is covalently bound to the side chain of an amino acid on the CPP. In embodiments, a linker is covalently bound to a side chain or terminus of an amino acid on the nuclease. In embodiments, a linker is covalently bound to 5’ end, 3’ end, or backbone of the guide RNA sequence.

[0163] In embodiments, the linker is a bivalent or trivalent C1-C50 saturated or unsaturated, straight or branched alkyl, wherein 1-25 methylene groups are optionally and independently replaced by -N(H)-, -N(CI-C4 alkyl)-, -N(cycloalkyl)-, -O-, -C(O)-, -C(O)O-, - S-, -S(O)-, -S(O) 2 -, -S(O) 2 N(CI-C 4 alkyl)-, -S(O) 2 N(cycloalkyl)-, -N(H)C(O)-, -N(CI-C 4 alkyl)C(O)-, -N(cycloalkyl)C(O)-, -C(O)N(H)-, -C(O)N(CI-C 4 alkyl), -C(O)N(cycloalkyl), aryl, heteroaryl, cycloalkyl, or cycloalkenyl.

[0164] In embodiments, a linker comprises (i) one or more D or L amino acids, each of which is optionally substituted; (ii) alkylene, alkenylene, alkynylene, carbocyclyl, or heterocyclyl, each of which is optionally substituted; (iii) -(J-R^z, wherein each R 1 is independently alkylene, alkenylene, alkynylene, carbocyclyl, or heterocyclyl, each J is independently NR 3 , -NR 3 C(O)-, S, or O, wherein R 3 is H, alkyl, alkenyl, alkynyl, carbocyclyl, or heterocyclyl, each of which is optionally substituted, and z is an integer from 1 to 50; (iv) - (J-R 2 )X wherein each R 2 is independently alkylene, alkenylene, alkynylene, carbocyclyl, or heterocyclyl, each J is independently NR 3 , -NR 3 C(O)-, S, or O, wherein R 3 is H, alkyl, alkenyl, alkynyl, carbocyclyl, or heterocyclyl, each of which is optionally substituted, and x is an integer from 1 to 50; - (v) -(R 1_ J-R 2 )z-, wherein each of R 1 and R 2 , at each instance, is independently alkylene, alkenylene, alkynylene, carbocyclyl, or heterocyclyl, each X is independently NR 3 , - NR 3 C(O)-, S, or O, wherein R 3 is H, alkyl, alkenyl, alkynyl, carbocyclyl, or heterocyclyl, each of which is optionally substituted, and z is an integer from 1 to 50; or (vi) combinations thereof.

[0165] In embodiments, a linker comprises (i) P alanine and lysine residues; (ii) -(J- R x )z; (iii) -(J-R 2 )x; (iv) or combinations thereof. In embodiments, each R 1 and R 2 is independently alkylene, alkenylene, alkynylene, carbocyclyl, or heterocyclyl, each J is independently NR 3 , -NR 3 C(O)-, S, or O, wherein R 3 is H, alkyl, alkenyl, alkynyl, carbocyclyl, or heterocyclyl, each of which is optionally substituted, and x and z are independently an integer from 1 to 50. In embodiments, each R 1 and R 2 is independently alkylene and each J is O.

[0166] In embodiments, the linker has the structure:

59

SUBSTITUTE SHEET (RULE 26) wherein: each AA is independently an amino acid; AAsc is an amino acid side chain; x is an integer from 1 to 10; y is an integer from 1 to 5; and z is an integer from 1 to 10.

[0167] In embodiments, a linker has the following structure: wherein: x is an integer from 2 to 20; y is an integer from 1 to 5; and z is an integer from 2 to 20; wherein AAsc is a side chain of an amino acid residue of the CPP; and wherein M is a bonding group.

[0168] In embodiments, z is an integer from 5 to 15. In one embodiment, z is 11. In embodiments, x is an integer from 1 to 10. In one embodiment, x is 1. In embodiments, the CPP is attached to the nuclease or guide RNA sequence through a linker (“L”). In embodiments, the linker is conjugated to the nuclease or guide RNA sequence through a bonding group (“M”).

[0169] As discussed above, a linker or M may be covalently bound to the nuclease or guide RNA sequence at any suitable location on the nuclease or guide RNA sequence. In various embodiments, L or M is covalently bound to the 3' end of the nuclease or guide RNA sequence or the 5' end of the nuclease or guide RNA sequence. In embodiments, L or M is covalently bound to the backbone of the nuclease or guide RNA sequence.

[0170] In embodiments, a linker is bound to the side chain of aspartic acid, glutamic acid, glutamine, asparagine, or lysine, or a modified side chain of glutamine or asparagine (e.g., a reduced side chain comprising an amino group), on the CPP. In embodiments, the L is bound to the side chain of lysine on the CPP.

[0171] In embodiments, a linker has the following structure:

SUBSTITUTE SHEET (RULE 26) wherein

M is a group that conjugates L to nuclease or RNA guide sequence;

AAsis a side chain or terminus of an amino acid on the CPP;

AAx is an amino acid; o is an integer from 0 to 10; and p is an integer from 0 to 5.

[0172] In embodiments, a linker has the following structure: wherein

M is a group that conjugates the linker to an oligonucleotide;

AAsis a side chain or terminus of an amino acid on the CPP;

AAx is an amino acid; o is an integer from 0 to 10; and p is an integer from 0 to 5.

[0173] A linker or M may be covalently bound to the nuclease or guide RNA sequence at any suitable location on the nuclease or guide RNA sequence. In various embodiments of the present disclosure, M is covalently bound to a nucleophilic moiety on the nuclease. In embodiments, the nucleophilic moiety is a nitrogen-containing moiety.

[0174] In embodiments, M comprises an alkylene, alkenylene, alkynylene, carbocyclyl, or heterocyclyl, each of which is optionally substituted. In embodiments, M is:

61

SUBSTITUTE SHEET (RULE 26)

, wherein R is alkyl, alkenyl, alkynyl, carbocyclyl, or heterocyclyl. In embodiments, M is:

SUBSTITUTE SHEET (RULE 26)

wherein: R 1 is alkylene, cycloalkyl, or , wherein m is an integer from 0 to 10. In embodiments, integer from 0 to 10. In embodiments

[0175] In embodiments, M is a heterobifunctional crosslinker, e.g., Q OH , which is disclosed in Williams et al. Curr. Protoc Nucleic Acid Chem. 2010, 42, 4.41.1-4.41.20, incorporated herein by reference its entirety.

[0176] In embodiments, m is an integer from 0 to 10, e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. In embodiments, m is an integer from 1 to 5. In embodiments, m is an integer from 1 to 3. In embodiments, m is 1. In embodiments, m is 2. In embodiments, m is 3. In embodiments, m is 4. In embodiments, m is 5.

[0177] In embodiments, AA s is a side chain or terminus of an amino acid on the CPP. Non-limiting examples of AA S include aspartic acid, glutamic acid, glutamine, asparagine, or

63

SUBSTITUTE SHEET (RULE 26) lysine, or a modified side chain of glutamine or asparagine (e.g., a reduced side chain comprising an amino group).

[0178] In embodiments, each AA X is independently a natural or non-natural amino acid.

In embodiments, one or more AA X are a natural amino acid. In embodiments, one or more AA X are a non-natural amino acid. In embodiments, one or more AA X are a P-amino acid. In embodiments, the P-amino acid is P-alanine.

[0179] In embodiments, o is an integer from 0 to 10, e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. In embodiments, o is 0, 1, 2, or 3. In embodiments, o is 0. In other embodiments, o is 1. In other embodiments, o is 2. In another embodiment, o is 3.

[0180] In embodiments, p is 0 to 5, e.g., 0, 1, 2, 3, 4, or 5. In embodiments, p is 0. In other embodiments, p is 1. In other embodiments p is 2. In other embodiments, p is 3. In another embodiment, p is 4. In another embodiment, p is 5.

[0181] In embodiments, a linker has a structure according to: wherein M, AA S , each -(R 1_ J-R 2 )z-, and o are defined as described herein; and r is 0 or 1.

[0182] In embodiments, r is 0. In embodiments, r is 1.

[0183] In embodiments, each of R 1 andR 2 , at each instance, are independently selected from alkylene, alkenylene, alkynylene, carbocyclyl, and heterocyclyl, each of which is optionally substituted.

[0184] In embodiments, each J is independently NR 3 , -NR 3 C(O)-, S, or O, and wherein R 3 is independently H, alkyl, alkenyl, alkynyl, carbocyclyl, or heterocyclyl, each of which is optionally substituted.

64

SUBSTITUTE SHEET (RULE 26) [0185] In embodiments, a linker has a structure according to: wherein each of M, AA S , o, p, q and r are defined above.

[0186] In embodiments, q is an integer from 1 to 50, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50, inclusive of all ranges and values therebetween. In other embodiments, q is an integer from 5 to 20. In other embodiments, q is an integer from 10 to 15.

[0187] In embodiments, a linker has a structure according to: wherein M, AA S and o are as defined as described herein.

[0188] Other non-limiting examples of suitable linker groups include:

SUBSTITUTE SHEET (RULE 26)

SUBSTITUTE SHEET (RULE 26) wherein M is a group that conjugates a linker to a nuclease, guide sequence, or exocyclic peptide; and AA S is a side chain or terminus of an amino acid on the CPP.

[0189] In embodiments, a linker and M comprise the following structure: wherein m is an integer from 0 to 10;

NU is a nuclease or guide RNA associated with the nuclease; and AAsis a side chain or terminus of an amino acid on the CPP.

[0190] In embodiments, the present disclosure provides a compound comprising the following structure: wherein:

EP, M, NU (nuclease or guide RNA associated with the nuclease), x, y, and z are as defined above, and AAsc comprises a side chain of an amino acid residue on the CPP. [0191] In embodiments, a precursor to L also contains a thiol (-SH) group, which forms a disulfide bond with the side chain of cysteine or cysteine analog located on the nuclease.

67

SUBSTITUTE SHEET (RULE 26) Accordingly, in various embodiments, the compounds disclosed herein (e.g., the compounds for comprise the following structure:

[0192] In embodiments, the disulfide bond is formed between a thiol group on L, and the side chain of cysteine or an amino acid analog having a thiol group on the nuclease. Such thiol-containing side chains may be located on native amino acids of the nuclease, or such thiol - containing amino acids may be introduced on the nuclease. Non-limiting examples of amino acid analogs having a thiol group which can be used with the polypeptide conjugates disclosed herein include:

[0193] In embodiments, L is wherein AAs is a side chain or terminus of an amino acid on the CPP.

SUBSTITUTE SHEET (RULE 26) [0194] In embodiments, a disulfide bond is formed between a thiol group on L, and the side chain of cysteine on the nuclease. In embodiments, the cysteine may be a constituent of the nuclease or the nuclease may be modified to include cysteine or an amino acid analog having a thiol group. In embodiments, any suitable functional group of the nuclease may be modified to form a thiol group for bonding to the linker L.

Polynucleotides and Expression Vectors

Polynucleotides

[0195] Provided herein are nucleic acid molecules comprising a nucleic acid sequence encoding nuclease, a modified looped nuclease, or a gRNA as described herein. The terms “polynucleotide” and “nucleic acid,” used interchangeably herein, refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. Thus, this term includes, but is not limited to, single-, double-, or multi -stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases. “Oligonucleotide” generally refers to polynucleotides of between about 5 and about 100 nucleotides of single- or double-stranded DNA. However, for the purposes of this disclosure, there is no upper limit to the length of an oligonucleotide. Oligonucleotides are also known as “oligomers” or “oligos” and may be isolated from genes, or chemically synthesized by methods known in the art. The terms “polynucleotide” and “nucleic acid” should be understood to include, as applicable to the embodiments being described, single-stranded and doublestranded polynucleotides.

[0196] Terms used to describe sequence relationships between two or more polynucleotides or polypeptides include “reference sequence,” “comparison window,” “sequence identity,” “percentage of sequence identity,” and “substantial identity”. A “reference sequence” is at least 12 but frequently 15 to 18 and often at least 25 monomer units, inclusive of nucleotides and amino acid residues, in length. Because two polynucleotides may each comprise (1) a sequence (z.e., only a portion of the complete polynucleotide sequence) that is similar between the two polynucleotides, and (2) a sequence that is divergent between the two polynucleotides, sequence comparisons between two (or more) polynucleotides are typically performed by comparing sequences of the two polynucleotides over a “comparison window” to identify and compare local regions of sequence similarity. A “comparison window” refers to a conceptual segment of at least 6 contiguous positions, usually about 50 to about 100, more 69

SUBSTITUTE SHEET (RULE 26) usually about 100 to about 150 in which a sequence is compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. The comparison window may comprise additions or deletions (i.e., gaps) of about 20% or less as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Optimal alignment of sequences for aligning a comparison window may be conducted by computerized implementations of algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Drive Madison, WI, USA) or by inspection and the best alignment (z.e., resulting in the highest percentage homology over the comparison window) generated by any of the various methods selected. Reference also may be made to the BLAST family of programs as for example disclosed by Altschul etal., 1997, Nucl. Acids Res. 25:3389. A detailed discussion of sequence analysis can be found in Unit 19.3 of Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons Inc, 1994-1998, Chapter 15.

[0197] The recitations “sequence identity” or, for example, comprising a “sequence 50% identical to,” as used herein, refer to the extent that sequences are identical on a nucleotide-by-nucleotide basis or an amino acid-by-amino acid basis over a window of comparison. Thus, a “percentage of sequence identity” may be calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base (e.g., A, T, C, G, I) or the identical amino acid residue (e.g., Ala, Pro, Ser, Thr, Gly, Vai, Leu, He, Phe, Tyr, Trp, Lys, Arg, His, Asp, Glu, Asn, Gin, Cys and Met) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (z.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity.

[0198] As used herein, the terms “polynucleotide variant” and “variant” and the like refer to polynucleotides displaying substantial sequence identity with a reference polynucleotide sequence or polynucleotides that hybridize with a reference sequence under stringent conditions that are defined hereinafter. These terms include polynucleotides in which one or more nucleotides have been added or deleted, or replaced with different nucleotides compared to a reference polynucleotide. In this regard, it is well understood in the art that certain alterations inclusive of mutations, additions, deletions, and substitutions can be made

70

SUBSTITUTE SHEET (RULE 26) to a reference polynucleotide whereby the altered polynucleotide retains the biological function or activity of the reference polynucleotide.

[0199] In embodiments, polynucleotides or variants have at least or about 50%, 55%, 60%, 65%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to a reference sequence.

[0200] The polynucleotides contemplated herein, regardless of the length of the coding sequence itself, may be combined with other DNA sequences, such as promoters and/or enhancers, untranslated regions (UTRs), signal sequences, Kozak sequences, polyadenylation signals, additional restriction enzyme sites, multiple cloning sites, internal ribosomal entry sites (IRES), recombinase recognition sites (e.g., LoxP, FRT, and Att sites), termination codons, transcriptional termination signals, and polynucleotides encoding self-cleaving polypeptides, epitope tags, as disclosed elsewhere herein or as known in the art, such that their overall length may vary considerably. It is therefore contemplated that a polynucleotide fragment of almost any length may be employed, with the total length preferably being limited by the ease of preparation and use in the intended recombinant DNA protocol. Polynucleotides can be prepared, manipulated and/or expressed using any of a variety of well-established techniques known and available in the art.

Promoters and Signal sequences

[0201] In embodiments, a vector may also comprise a sequence encoding a signal peptide (e.g., for nuclear localization, nucleolar localization, mitochondrial localization), fused to the polynucleotide encoding the modified looped nuclease. For example, a vector may comprise a nuclear localization sequence (e.g., from SV40 or cMyc) fused to the polynucleotide encoding the modified looped nuclease. Exemplary nuclear localization sequences are provided below:

SV40: PKKKRKV (SEQ ID NO: 128)

NLP: AVKRPAATKKAGQAKKKKLD (SEQ ID NO: 129) TUS: KLKIKRPVK (SEQ ID NO: 130)

EGL-13: MSRRRKANPTKLSENAKKLAKEVEN (SEQ ID NO: 131)

71

SUBSTITUTE SHEET (RULE 26) Vectors

[0202] The term “vector” is used herein to refer to a nucleic acid molecule capable transferring or transporting another nucleic acid molecule. The transferred nucleic acid is generally linked to, e.g., inserted into, the vector nucleic acid molecule. A vector may include sequences that direct autonomous replication in a cell, or may include sequences sufficient to allow integration into host cell DNA.

[0203] The term “expression cassette” as used herein refers to genetic sequences within a vector which can express a RNA, and subsequently a protein. The nucleic acid cassette contains the gene of interest, e.g., a modified looped nuclease. The nucleic acid cassette is positionally and sequentially oriented within the vector such that the nucleic acid in the cassette can be transcribed into RNA, and when necessary, translated into a protein or a polypeptide, undergo appropriate post-translational modifications required for activity in the transformed cell, and be translocated to the appropriate compartment for biological activity by targeting to appropriate intracellular compartments or secretion into extracellular compartments. Preferably, the cassette has its 3' and 5' ends adapted for ready insertion into a vector, e.g., it has restriction endonuclease sites at each end. The cassette can be removed and inserted into a plasmid or viral vector as a single unit. In embodiments, the nucleic acid cassette contains the sequence of a modified looped nuclease.

[0204] Exemplary vectors include, without limitation, plasmids, phagemids, cosmids, transposons, artificial chromosomes such as yeast artificial chromosome (YAC), bacterial artificial chromosome (BAC), or Pl -derived artificial chromosome (PAC), bacteriophages such as lambda phage or Ml 3 phage, and animal viruses. Examples of categories of animal viruses useful as vectors include, without limitation, retrovirus (including lentivirus), adenovirus, adeno-associated virus, herpesvirus (e.g., herpes simplex virus), poxvirus, baculovirus, papillomavirus, and papovavirus (e.g., SV40). Examples of expression vectors are pClneo vectors (Promega) for expression in mammalian cells; pLenti4/V5-DEST™, pLenti6/V5-DEST™, and pLenti6.2/V5-GW/lacZ (Invitrogen) for lentivirus-mediated gene transfer and expression in mammalian cells. In particular embodiments, the coding sequences of the modified looped nuclease disclosed herein can be ligated into such expression vectors for the expression of the modified looped nuclease in host cells. In embodiments, non-viral vectors are used to deliver one or more polynucleotides contemplated herein to a host cell.

72

SUBSTITUTE SHEET (RULE 26) [0205] In embodiments, the vector is a non-integrating vector, including but not limited to, an episomal vector or a vector that is maintained extrachromosomally. As used herein, the term “episomal” refers to a vector that is able to replicate without integration into host’s chromosomal DNA and without gradual loss from a dividing host cell also meaning that the vector replicates extrachromosomally or episomally. The vector is engineered to harbor the sequence coding for the origin of DNA replication or “ori” from a lymphotrophic herpes virus or a gamma herpesvirus, an adenovirus, SV40, a bovine papilloma virus, or a yeast, specifically a replication origin of a lymphotrophic herpes virus or a gamma herpesvirus corresponding to oriP of EBV. In a particular aspect, the lymphotrophic herpes virus may be Epstein Barr virus (EBV), Kaposi's sarcoma herpes virus (KSHV), Herpes virus saimiri (HS), or Marek's disease virus (MDV). Epstein Barr virus (EBV) and Kaposi's sarcoma herpes virus (KSHV) are also examples of a gamma herpesvirus. Typically, the host cell comprises the viral replication transactivator protein that activates the replication.

[0206] In embodiments, a polynucleotide is introduced into a target or host cell using a transposon vector system. In certain embodiments, the transposon vector system comprises a vector comprising transposable elements and a polynucleotide contemplated herein; and a transposase. In one embodiment, the transposon vector system is a single transposase vector system, see, e.g., WO 2008/027384. Exemplary transposases include, but are not limited to: piggyBac, Sleeping Beauty, Mosl, Tcl/mariner, Tol2, mini-Tol2, Tc3, MuA, Himar I, Frog Prince, and derivatives thereof. The piggyBac transposon and transposase are described, for example, in U.S. Patent 6,962,810, which is incorporated herein by reference in its entirety. The Sleeping Beauty transposon and transposase are described, for example, in Izsvak et al., J. Mol. Biol. 302: 93-102 (2000), which is incorporated herein by reference in its entirety. The Tol2 transposon which was first isolated from the medaka fish Oryzias latipes and belongs to the hAT family of transposons is described in Kawakami et al. (2000). Mini-Tol2 is a variant of Tol2 and is described in Balciunas et al. (2006). The Tol2 and Mini-Tol2 transposons facilitate integration of a transgene into the genome of an organism when co-acting with the Tol2 transposase. The Frog Prince transposon and transposase are described, for example, in Miskey et al., Nucleic Acids Res . 31 :6873-6881 (2003).

[0207] The “control elements” or “regulatory sequences” present in an expression vector are those non-translated regions of the vector (e.g., origin of replication, selection cassettes, promoters, enhancers, translation initiation signals (Shine Dalgarno sequence or

73

SUBSTITUTE SHEET (RULE 26) Kozak sequence) introns, a polyadenylation sequence, 5' and 3' untranslated regions) which interact with host cellular proteins to carry out transcription and translation. Such elements may vary in their strength and specificity. Depending on the vector system and host utilized, any number of suitable transcription and translation elements, including ubiquitous promoters and inducible promoters may be used. In embodiments, the polynucleotide of interest is operably linked to a control element or regulatory sequence. “Operably linked” refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. For instance, a promoter is operably linked to a polynucleotide sequence if the promoter affects the transcription or expression of the polynucleotide sequence.

[0208] In embodiments, the polynucleotide of interest is operably linked to a promoter sequence. The term “promoter” as used herein refers to a recognition site of a polynucleotide (DNA or RNA) to which an RNA polymerase binds. An RNA polymerase initiates and transcribes polynucleotides operably linked to the promoter. Illustrative ubiquitous promoters suitable for use in particular embodiments include, but are not limited to, a cytomegalovirus (CMV) immediate early promoter, a viral simian virus 40 (SV40) (e.g., early or late) promoter, a spleen focus forming virus (SFFV) promoter, a Moloney murine leukemia virus (MoMLV) LTR promoter, a Rous sarcoma virus (RSV) LTR, a herpes simplex virus (HSV) (thymidine kinase) promoter, H5, P7.5, and Pl l promoters from vaccinia virus, an elongation factor 1- alpha (EFla) promoter, early growth response 1 (EGR1) promoter, a ferritin H (FerH) promoter, a ferritin L (FerL) promoter, a Glyceraldehyde 3 -phosphate dehydrogenase (GAPDH) promoter, a eukaryotic translation initiation factor 4A1 (EIF4A1) promoter, a heat shock 70kDa protein 5 (HSPA5) promoter, a heat shock protein 90kDa beta, member 1 (HSP90B1) promoter, a heat shock protein 70kDa (HSP70) promoter, a P-kinesin (P-KIN) promoter, the human ROSA 26 locus (Irions et al., Nature Biotechnology 25, 1477-1482 (2007)), a Ubiquitin C (UBC) promoter, a phosphoglycerate kinase- 1 (PGK) promoter, a cytomegalovirus enhancer/chicken P-actin (CAG) promoter, a P-actin promoter and a myeloproliferative sarcoma virus enhancer, negative control region deleted, dl587rev primerbinding site substituted (MND) promoter (Challita et al., J Virol. 69(2):748-55 (1995)).

[0209] Illustrative methods of non-viral delivery of polynucleotides contemplated in particular embodiments include, but are not limited to: electroporation, sonoporation, lipofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, nanoparticles,

74

SUBSTITUTE SHEET (RULE 26) polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, DEAE-dextran- mediated transfer, gene gun, and heat-shock.

[0210] Illustrative examples of polynucleotide delivery systems suitable for use in particular embodiments contemplated in particular embodiments include, but are not limited to, those provided by Amaxa Biosystems, Maxcyte, Inc., BTX Molecular Delivery Systems, and Copernicus Therapeutics Inc. Lipofection reagents are sold commercially (e.g., Transfectam™ and Lipofectin™). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides have been described in the literature. See e.g., Liu et al. (2003) Gene Therapy. 10: 180-187; and Balazs et al. (2011) Journal of Drug Delivery. 2011 : 1-12. Antibody-targeted, bacterially derived, non-living nanocell-based delivery is also contemplated in particular embodiments.

Protein Expression Systems

[0211] In embodiments, a vector comprising an expression cassette comprising nucleic acid sequence encoding a modified looped nuclease described herein is introduced into a host cell that is capable of expressing the encoded modified looped nuclease. Exemplary host cells include Chinese Hamster Ovary (CHO) cells, HEK 293 cells, BHK cells, murine NSO cells, or murine SP2/0 cells, and E. coll cells. The expressed protein is then purified from the culture system using any one of a variety of methods known in the art (e.g. , Protein A columns, affinity chromatography, size-exclusion chromatography, and the like).

[0212] Numerous expression systems exist that are suitable for use in producing the modified loop proteins described herein. Eukaryote-based systems in particular can be employed to produce nucleic acid sequences, or their cognate polypeptides, proteins and peptides. Many such systems are commercially and widely available.

[0213] In embodiments, the modified loop proteins described herein are produced using Chinese Hamster Ovary (CHO) cells following standardized protocols. Alternatively, for example, transgenic animals may be utilized to produce the modified loop proteins described herein, generally by expression into the milk of the animal using well established transgenic animal techniques. Lonberg N. Human antibodies from transgenic animals. Nat Biotechnol. 2005 Sep;23(9): 1117-25; Kipriyanov et al. Generation and production of engineered antibodies. Mol Biotechnol. 2004 Jan;26(l):39-60; See also Ko et al., Plant biopharming of monoclonal antibodies. Virus Res. 2005 Jul; 11 l(l):93-100.

75

SUBSTITUTE SHEET (RULE 26) [0214] The insect cell/baculovirus system can produce a high level of protein expression of a heterologous nucleic acid segment, such as described in U.S. Patent No. 5,871,986 and 4,879,236, both incorporated herein by reference in their entireties, and which can be bought, for example, under the name MAXBAC® 2.0 from Invitrogen and BACPACK™ Baculovirus expression system from Clonotech.

[0215] Other examples of expression systems include Stratagene's Complete Control Inducible Mammalian Expression System, which utilizes a synthetic ecdysone-inducible receptor. Another example of an inducible expression system is available from Invitrogen, which carries the T-REX™ (tetracyclineregulated expression) System, an inducible mammalian expression system that uses the full-length CMV promoter. Invitrogen also provides a yeast expression system called the Pichia methanolica Expression System, which is designed for high-level production of recombinant proteins in the methylotrophic yeast Pichia methanolica. One of skill in the art would know how to express vectors such as an expression construct comprising a nucleic acid sequence encoding a modified looped nuclease described herein, to produce its encoded nucleic acid sequence or its cognate polypeptide, protein, or peptide. See, generally, Recombinant Gene Expression Protocols By Rocky S. Tuan, Humana Press (1997), ISBN 0896033333; Advanced Technologies for Biopharmaceutical Processing By Roshni L. Dutton, Jeno M. Scharer, Blackwell Publishing (2007), ISBN 0813 805171; Recombinant Protein Production With Prokaryotic and Eukaryotic Cells By Otto-Wilhelm Merten, Contributor European Federation of Biotechnology, Section on Microbial Physiology Staff, Springer (2001), ISBN 0792371372.

[0216] As an alternative, proteins of the present invention can be synthesized by exclusive solid phase synthesis, partial solid phase methods, fragment condensation or classical solution synthesis. These synthesis methods are well-known to those of skill in the art (see, for example, Merrifield, J. Am. Chem. Soc. 85:2149 (1963), Stewart et al., “Solid Phase Peptide Synthesis” (2nd Edition), (Pierce Chemical Co. 1984), Bayer and Rapp, Chem. Pept. Prot. 3:3 (1986), Atherton et al., Solid Phase Peptide Synthesis: A Practical Approach (IRL Press 1989), Fields and Colowick, “Solid-Phase Peptide Synthesis,” Methods in Enzymology Volume 289 (Academic Press 1997), and Lloyd-Williams et al., Chemical Approaches to the Synthesis of Peptides and Proteins (CRC Press, Inc. 1997)). Variations in total chemical synthesis strategies, such as “native chemical ligation” and “expressed protein ligation” are also standard (see, for example, Dawson et al., Science 266:776 (1994), Hackeng et al., Proc. Nat'l Acad.

76

SUBSTITUTE SHEET (RULE 26) Sci. USA 94:7845 (1997), Dawson, Methods Enzymol. 287: 34 (1997), Muir et al, roc. Nat'l Acad. Sci. USA 95:6705 (1998), and Severinov and Muir, J. Biol. Chem. 273: 16205 (1998)). In one example of expressed protein ligation, a recombinantly expressed protein is cleaved from an intein and the protein is ligated to a peptide containing an N-terminal cysteine having an unoxidized sulfhydryl side chain, by contacting the protein with the peptide in a reaction solution containing a conjugated thiophenol. This forms a C-terminal thioester of the recombinant protein which spontaneously rearranges intramolecularly to form an amide bond linking the protein to the peptide. See, generally, Muir, TW et al Expressed Protein Ligation: A General Method for Protein Engineering, PNAS (1998) 95(12)6705-6710; US Pat. No. 6,849,428; US Pub. 2002/0151006; Bondalapati, etal., Expanding the chemical toolbox for the synthesis of large and uniquely modified proteins. (2016) Nature Chemistry volume 8, pages 407-418; Amy E. Rabideau and Bradley Lether Pentelute* . Delivery of Non-Native Cargo into Mammalian Cells Using Anthrax Lethal Toxin. ACS Chem. (2016) Biol., 11(6) 1490-1501; and Weidmann etal., Copying Life: Synthesis of an Enzymatically Active Mirror-Image DNA- Ligase Made of D-Amino Acids. Cell Chemical Biology, (2019 May 16) 26(5); 616-619.

Transfection of target cells

[0217] Any appropriate means may be used to introduce a component of a CRISPR- Cas gene editing system, including, for example, a nuclease, a modified looped nuclease, a gRNA, or a ribonucleoprotein complex (RNP) to the target cell. In embodiments, the compound comprising a component of a CRISPR-Cas gene editing system is transfected into the target cell by electroporation, lipofection, viral-delivery, or incubation of the target cell and modified looped nuclease. In embodiments, the nuclease or modified looped nuclease is delivered to the target cell as a ribonucleoprotein comprising guide RNA conjugated to a nuclease, for example, a modified loop nuclease. In embodiments, the RNP comprises a recombinant Cas9 protein and an sgRNA or a crRNA:tracrRNA duplex. In embodiments, the nuclease or modified looped nuclease is delivered to the target cell separately from the guide RNA. In embodiments, the target cell is transfected with the guide RNA (e.g. sgRNA) and then incubated with a nuclease, for example, a modified looped nuclease.

[0218] In embodiments, the CPPs comprise one or more cell- or tissue- targeting moieties. Without being bound by theory, the conjugation of these types of moieties targets the CPP and cargo to the desired cell or tissue type to improve the efficiency of uptake of the CPP and cargo into the targeted cell or tissue. A homing peptide for a desired cell or tissue-targeting 77

SUBSTITUTE SHEET (RULE 26) moiety can be identified by biopanning of phage displayed libraries. In embodiments, the cell- or tissue-targeting moiety is a polypeptide or fragment thereof that binds to a cell (e.g. an antibody, ligand, or polypeptide that binds a carbohydrate). In embodiments, the cell- or tissuetargeting moiety delivers the CPP conjugates of the present disclosure to any desired cell or tissue. In embodiments, the cell or tissue includes, but is not limited to, breast, prostate, colon, brain, liver, neuronal, central nervous system, muscle, cardiac muscle, smooth muscle, skeletal muscle, lung, heart, epithelial tissue, vascular tissue, gastrointestinal tract, spinal cord, tumor, solid tumor, and a cancer cell.

Analysis of enzyme activity

[0219] An advantage of the CRISPR system is the ability to target any sequence in a DNA sequence that contains a PAM motif on either strand of DNA for editing. Nuclease binding depends on the complementary base pairing of the guide RNA to the DNA target to produce a targeted double-strand break in the DNA. This break is then repaired by the endogenous cellular repair machinery and can lead to local insertion and/or deletion events via the nonhomologous end-joining pathway or to precise sequence modification via homology- directed repair when a user-defined donor template is provided. However, not all guide RNAs are equally effective at directing the nuclease-mediated DNA modifications and the conjugates disclosed herein may show different stability in vitro, ex vivo, or in vivo.

[0220] Any appropriate assay may be used to assess genome editing by a construct of the present disclosure. In embodiments, the assay includes, but is not limited to, a T7 endonuclease 1 (T7E1) mismatch detection assay, next-generation sequencing (NGS), tracking of indels by decomposition (TIDE) assay, Indel Detection by Amplicon Analysis (IDAA), and a DNA cleavage assay. In embodiments, nuclease activity is assayed in vitro. In embodiments nuclease activity is assayed ex vivo. In embodiments, the nuclease activity is assayed in vivo. In embodiments, the assay is a cell-based assay. In embodiments, the assay is a synthetic assay. In embodiments, the synthetic assay comprises one or more substrate DNA sequences known to be targets of the gRNA or RNP.

Compositions

[0221] Provided herein are compositions comprising a nuclease, for example, CRISPR- associated nuclease, for example, a modified looped nuclease. In embodiments, the compositions comprise a gRNA. The characteristics of gRNAs are described throughout this

78

SUBSTITUTE SHEET (RULE 26) disclosure. In embodiments, the compositions comprise an unmodified nuclease. An unmodified nuclease does not comprise a CPP.

[0222] In embodiments, the compositions described herein comprise at least one modified nuclease, at least two nucleases, at least three nucleases, or more. In embodiments, the compositions described herein comprise at least one modified looped nuclease, at least two modified looped nucleases, at least three modified loop nucleases, or more.

Numbered Embodiments

[0223] Embodiment 1. A modified looped nuclease comprising at least one loop region, wherein the at least one loop region comprises a cell penetrating peptide (CPP) sequence inserted into the loop region.

[0224] Embodiment 2. The modified looped nuclease of embodiment 0, wherein the looped nuclease is selected from Cas9, Cas9 variant, Cast 2a (Cpfl), Cast 2b, Cast 2c, Tnp- B like, Cast 3a (C2c2), Cast 3b, and Cast 4 nuclease

[0225] Embodiment 3. The modified looped nuclease of embodiment 0, wherein the nuclease is Cas9 or a Cas9 variant.

[0226] Embodiment 4. The modified looped nuclease of embodiment 2, wherein the nuclease comprises at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100% sequence identity to the nuclease of claim 2.

[0227] Embodiment s. The modified looped nuclease of embodiment 1, comprising a detectable tag.

[0228] Embodiment 6. The modified looped nuclease of embodiment 5, wherein the detectable tag is selected from a FLAG tag, a polyhistidine tag, a SNAP tag, a Halo tag, cMyc, glutathione-S-transferase, avidin, an enzyme, a fluorescent protein, a luminescent protein, a chemiluminescent protein, a bioluminescent protein, and a phosphorescent protein.

[0229] Embodiment 7. The modified looped nuclease of any one of embodiment

1-6, comprising a guide RNA (gRNA).

[0230] Embodiment 8. The modified looped nuclease of any one of embodiment

0-7, wherein the CPP sequence comprises at least three arginines, or analogs thereof.

79

SUBSTITUTE SHEET (RULE 26) [0231] Embodiment 9. The modified looped nuclease of any one of embodiment

1-8, wherein the CPP comprises from three to six arginines, or analogs thereof.

[0232] Embodiment 10. The modified looped nuclease of any one of embodiment

0-9, wherein the CPP comprises at least one amino acid with a hydrophobic side chain.

[0233] Embodiment 11. The modified looped nuclease of embodiment 10, wherein the CPP comprises from one to six amino acids with a hydrophobic side chain.

[0234] Embodiment 12. The modified looped nuclease of embodiment 10, wherein the amino acids with a hydrophobic side chain are independently selected from glycine, alanine, valine, leucine, isoleucine, methionine, phenylalanine, tryptophan, proline, naphthylalanine, phenylglycine, homophenylalanine, tyrosine, cyclohexylalanine, piperidine-

2-carboxylic acid, cyclohexylalanine, norleucine, 3-(3-benzothienyl)-alanine, 3-(2-quinolyl)- alanine, O-benzylserine, 3-(4-(benzyloxy)phenyl)-alanine, S-(4-methylbenzyl)cysteine, N- (naphthalen-2-yl)glutamine, 3-(l,r-biphenyl-4-yl)-alanine, tert-leucine, or nicotinoyl lysine, each of which is optionally substituted with one or more substituents.

[0235] Embodiment 13. The modified looped nuclease of any one of embodiment

10-12, wherein at least one of the amino acids with a hydrophobic side chain is tryptophan.

[0236] Embodiment 14. The modified looped nuclease of any one of embodiment

10-12, wherein each of the at least one of the amino acids with a hydrophobic side chain is tryptophan.

[0237] Embodiment 15. The modified looped nuclease of any one of embodiment 8-14, wherein the CPP sequence comprises at least three arginines and at least three tryptophans.

[0238] Embodiment 16. The modified looped nuclease of any one of embodiment

1-15, wherein the CPP sequence comprises from one to six D-amino acids.

[0239] Embodiment 17. The modified looped nuclease of any one of embodiment

1-16, comprising a first looped region and a second looped region, wherein a first CPP sequence is inserted into the first looped region, and a second CPP sequence is inserted into the second looped region.

80

SUBSTITUTE SHEET (RULE 26) [0240] Embodiment 18. The modified looped nuclease of embodiment 17, wherein the first CPP comprises at least three arginine, and the second CPP comprises at least three amino acids with a hydrophobic side chain

[0241] Embodiment 19. The modified looped nuclease of any one of embodiment

0-18, wherein the CPP sequence is independently selected from Table D.

[0242] Embodiment 20. A recombinant nucleic acid molecule encoding the modified looped nuclease of any one of embodiment 0-19.

[0243] Embodiment 21. An expression cassette comprising the recombinant nucleic acid molecule of embodiment 20 operably linked to a promoter.

[0244] Embodiment 22. A vector comprising the expression cassette of embodiment 21.

[0245] Embodiment 23. A host cell comprising the vector of embodiment 22.

[0246] Embodiment 24. The host cell of embodiment 0, wherein the host cell is selected from a Chinese Hamster Ovary (CHO) cell, an HEK 293 cell, a BHK cell, a murine NSO cell, a murine SP2/0 cell, or an E. coli cell.

[0247] Embodiment 25. A composition comprising a modified looped nuclease of any one of embodiment 1-19 and a gRNA.

[0248] Embodiment 26. A method of producing the modified looped nuclease of any one of embodiment 0-19, comprising culturing the host cell of claim 0 and purifying the expressed modified looped nuclease from the supernatant.

[0249] Embodiment 27. A method of treating a disease or condition, comprising administering a modified looped nuclease of any one of embodiment 1-19.

[0250] Embodiment 28. A method of gene editing, comprising administering a modified looped nuclease of any one of embodiment 1-19.

Examples

Example 1: Intracellular delivery of a Modified Looped Nuclease

[0251] Modified Looped Nuclease. Modified looped Cas9 nucleases comprising a CPP from Table D are prepared. FIG. 1 highlights the secondary structure of Cas9 (SEQ ID NO: 1). Loop regions are double underlined, helices are single underlined, and beta strands are 81

SUBSTITUTE SHEET (RULE 26) highlighted in bold and italics. Table E shows the amino acid ranges of Cas9 which contain loops. The amino acid ranges are numbered with respect to SEQ ID NO: 1. The CPP was inserted immediately prior to an amino acid within a loop region, immediately after an amino acid within a loop region of Cas9, or in place of one or more amino acids within loop regions. The CPP is labeled with a fluorescent dye.

[0252] Study Design. HeLa cells are cultured in six-well plates (5 x 10 5 cells per well) for 24 hours. After 24 hours, HeLa cells are transfected with sgRNA and then incubated with a modified looped nuclease (e.g., Cas9 comprising a CPP). As a negative control, the HeLa cells are incubated with an unmodified nuclease (e.g., Cas9) in the absence of CPP. The uptake efficiency of the modified looped nuclease is compared to uptake of the unmodified looped nuclease using fluorescence.

Example 2: Gene-Editing using a Modified Looped Nuclease

[0253] Study Design. The ability of various modified looped nucleases of Example 1 to cleave human target DNA by CRISPR is evaluated. Modified looped nucleases conjugated to a gRNA and compositions comprising a modified looped nuclease and a gRNA which is not conjugated to the modified looped nuclease are evaluated. The protocol described in U.S. Publication No. 2014/ 0068797, which is incorporated by reference herein, in its entirety, is used to evaluate the ability of the compositions to make gene edits.

[0254] Briefly, human HEK293T cells are transfected with the modified looped nuclease and/or gRNA. Western blotting is performed to confirm that the modified looped nuclease enters the HEK293T cells. Northern blotting is performed to confirm that the gRNA enters the HEK293T cells. Fluorescence microscopy is performed to visualize Cas9. A Surveyor assay is performed to assess site-specific genome cleavage.

Example 3: In vitro evaluation of nuclease activity

[0255] Study Design. The conjugation efficiency and retention of nuclease activity of the various modified looped nucleases of Example 1 are tested. CRISPR/Cas9 can be used in a ribonucleoprotein (RNP) format, in which the recombinant Cas9 protein is assembled in vitro with chemically synthesized sgRNA or a crRNA:tracrRNA duplex. To form the RNP, sgRNA or duplex RNA is prepared by annealing crRNA to tracrRNA, and then mixed with a modified looped nuclease of Example 1.

82

SUBSTITUTE SHEET (RULE 26) [0256] The DNA cleavage activity of the RNP is assayed on PCR products of chosen amplified genes by mixing the RNP with the PCR product of an amplified gene and incubating at 37°C for 2 hours. After incubation, 1 pl Proteinase K is added to the reaction, and then the mixture incubated at 65°C for 10 minutes to release the DNA from the RNP. The products of each reaction are assessed by electrophoresis on 2% agarose gel to visualize cleavage of the PCR product.

Example 4: Intracellular delivery of a modified nuclease ribonucleoprotein (RNP)

[0257] The conjugation of the nuclease to a cell-penetrating peptide allows the conjugate to be introduced to a cell without needing viral-delivery, lipofection, or other transfection methods that may introduce artefacts. Additionally, the use of RNP can reduce off- target effects versus plasmid transfection and can be less stressful on the cells.

[0258] Study Design. Cas9 RNPs are prepared immediately before experiments by incubating 20 pM Cas9 with 20 pM sgRNA at 1 : 1 ratio in 20 pM Hepes (pH 7.5), 150 mM KC1, 1 mM MgC12, 10% (vol/vol) glycerol, and 1 mM Tris(2-chloroethyl) phosphate (TCEP) at 37 °C for 10 min to a final concentration of 10 pM. The RNPs are then electroporated into HeLa cells. As a negative control, the HeLa cells are electroporated with an unmodified nuclease (e.g., Cas9) in the absence of CPP. The uptake efficiency of the modified looped nuclease is compared to uptake of the unmodified looped nuclease using fluorescence.

Example 5: Conjugation of CPP to Cas nuclease

[0259] CPP is conjugated to Cas nuclease, such as Cas9 and one or more Cas9 variants (see Table G, below) using one of three possible conjugation methods: Site-specific thiol- maleimide conjugation (for engineered Cysteine variants), N-terminal conjugation or lysine conjugation. The CPP and Cas nuclease may be conjugated by any known means in the art. For example, the CPP and the Cas nuclease may be conjugated by the enzymatic oxidation of tyrosine residues followed by reaction with cysteine thiols to allow covalent coupling as substantially described in Lobba et al. (2020) “Site-Specific Bioconjugation through Enzyme- Catalyzed Tyrosine-Cysteine Bond Formation” ACS Cent. Sci 6(9) 1564-1571, the contents of which are incorporated by reference herein.

[0260] Cas9 belongs to the class 2 type II CRISPR systems and is the most widely used genome editing tool. Streptococcus pyogenesvCas9 (SpCas9) was the first Cas nuclease to be used for genome editing in mammalian cells. However, recognition of the PAM 5’-NGG (N 83

SUBSTITUTE SHEET (RULE 26) represents any nucleotide) sequence limits the target sites in the human genome. To increase the availability of target sites, Cas9 variants have been generated with altered PAM specificities some of which are listed in Table G, below. See, Pickar-Oliver and Gersbach (2019) “The next generation of CRISPR-Cas technologies and applications. Nat Rev Mol Cell Biol. 20(8):490- 507, the contents of which is incorporated by reference herein.

Table G. : Cas9 variants with altered PAM and targeting specificities

84

SUBSTITUTE SHEET (RULE 26)

Example 6: In vitro DNA cleavage assay

[0261] Conjugation efficiency and retention of activity for the CPP-Cas conjugates prepared in Example 5 is evaluated by assembly of the RNP complex and evaluation of nuclease activity in an in vitro DNA cleavage assay, for example, substantially as described by Cromwell and Hubbard (2021) “In vitro assays for comparing the specificity of First- and Next- Generation CRISPR/Cas9 systems.” Methods Mol. Biol. 22162:215-232, the contents of which is incorporated by reference herein.

Example 7: Intracellular uptake of CPP-conjugated Cas nuclease

[0262] Intracellular uptake of the CPP-conjugated Cas nuclease prepared in Example 5 is evaluated by transfecting cells with sgRNA and treating transfected cells with the CPP- conjugated Cas nuclease. CPP conjugation to the Cas nuclease is quantified by mass spectrometry. Cas nuclease activity is evaluated using an enzyme activity assay using a DNA substrate, such as a reporter construct as substantially described in Martin et al. (2018): A fluorescent reporter for quantification and enrichment of DNA editing by APOBEC-Cas9 or cleavage by Cas9 in living cells” Nucleic Acids Res. 46(14)e84, the contents of which is incorporated by reference herein

Example 8: Intracellular uptake of CPP-conjugated RNP

[0263] Intracellular uptake of CPP-conjugated RNPs is evaluated as described in Example 3. CPP conjugation to the RNP is quantified by mass spectrometry. Nuclease activity is evaluated using an enzyme activity assay using a DNA substrate as described in Example 7.

85

SUBSTITUTE SHEET (RULE 26) Incorporation by Reference

[00263] All references, articles, publications, patents, patent publications, and patent applications cited herein are incorporated by reference in their entireties for all purposes. However, mention of any reference, article, publication, patent, patent publication, and patent application cited herein is not, and should not be taken as, an acknowledgment or any form of suggestion that they constitute valid prior art or form part of the common general knowledge in any country in the world.

86

SUBSTITUTE SHEET (RULE 26)