Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
POLYPEPTIDE FUSIONS OR CONJUGATES FOR GENE EDITING
Document Type and Number:
WIPO Patent Application WO/2024/086596
Kind Code:
A1
Abstract:
Provided herein are improved methods, compositions, and systems for editing genomic DNA, including editing genomic DNA by inserting long DNA sequences into genomic DNA.

Inventors:
BIBILLO AREK (US)
PATEL PRANAV (US)
Application Number:
PCT/US2023/077111
Publication Date:
April 25, 2024
Filing Date:
October 17, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
4M GENOMICS INC (US)
International Classes:
C12N9/22; C12N9/12; C12N9/90; C12N15/11; C12N15/90
Attorney, Agent or Firm:
TAVSHANJIAN, Brandon (US)
Download PDF:
Claims:
CLAIMS

WHAT IS CLAIMED IS:

1. A composition comprising a fusion protein comprising:

(a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide;

(b) a heterologous domain comprising at least about 100 amino acids; and

(c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein said first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site.

2. The composition of claim 1, wherein said second fragment further comprises a REC2 domain of said second Casl2 polypeptide.

3. The composition of claim 1 or 2, wherein said first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide.

4. The composition of any one of claims 1-3, wherein said heterologous domain comprises at least about 100-1500 amino acids in length.

5. The composition of any one of claims 1-4, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity.

6. The composition of claim 5, wherein said domain with DNA-dependent DNA polymerase activity or said domain with Topoisomerase activity do not comprise inactivating mutations in an active site residue.

7. The composition of claim 5 or 6, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a T7 DNA polymerase, Bst polymerase or an analog thereof, T4 DNA polymerase, Taq polymerase, Vent polymerase, Q5 polymerase, Klenow fragment, DNA polymerase theta, or Phi29 polymerase, or a functional fragment or derivative thereof.

8. The composition of any one of claims 1-7, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 44-52, or a variant thereof.

9. The composition of any one of claims 1-5, wherein said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type I topoisomerase domain.

10. The composition of claim 9, wherein said Type I topoisomerase domain comprises E. coll Eubacterial DNA topoisomerase I, E. coli Eubacterial DNA topoisomerase III, S. cerevisiae Yeast DNA topoisomerase III, H. sapiens DNA topoisomerase Illa or 11 ip, S. acidocaldarius eubacterial and archaeal reverse DNA gyrase, M. kandleri eubacterial reverse gyrase, H. sapiens eukaryotic DNA topoisomerase I, Vaccinia poxvirus topoisomerase I, or M. kandleri hyperthermophilic eubacterial DNA topoisomerase V, phiX174 protein A, or a functional fragment thereof. The composition of any one of claims 1-6, wherein said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type II topoisomerase domain. The composition of claim 11, wherein said type II topoisomerase domain comprises E. coli eubacterial DNA gyrase, E. coli eubacterial DNA topoisomerase IV, S. cerevisiae yeast DNA topoisomerase II, H. sapiens mammalian DNA topoisomerase Ila or I IQ, or S. shibatae archaeal DNA topoisomerase VI, or a functional fragment thereof. The composition of any one of claims 1-6, wherein said heterologous domain comprises a domain with topoisomerase activity and said heterologous domain comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 53-55. The composition of any one of claims 1-13, wherein said first Casl2 polypeptide and said second Casl2 polypeptide comprise a same Casl2 polypeptide. The composition of any one of claims 1-13, wherein said first Cast 2 polypeptide and said second Casl2 polypeptide comprise different Casl2 polypeptides. The composition of any one of claims 1-15, wherein said first Casl2 polypeptide and said second Casl2 polypeptide do not comprise an inactivating mutation in an active site residue of said first Cast 2 polypeptide or said second Cast 2 polypeptide. The composition of any one of claims 1-15, wherein said first fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 1, 5, 13, 15, or 24-34, or a variant thereof. The composition of any one of claims 1-15, wherein said second fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 2, 6, 11, or 35-43. The composition of any one of claims 1-18, further comprising an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site. The composition of claim 19, wherein said region with complementarity to a region 5' to said nucleic acid site or said region with complementarity to a region 3' to said nucleic acid site comprises at least 4 to 30 bp or at least 4 to 400 bp. The composition of claims 19 or 20, wherein said insert nucleic acid sequence comprises at least about Ibp to at least about 20 kb. The composition of any one of claims 19-21, wherein said insert DNA molecule is a singlestranded deoxyribonucleic acid molecule, at least partially a single-stranded

-n- deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. The composition of any one of claims 19-22, wherein said insert DNA molecule is: (i) linked to said first or said second Cast 2 polypeptide; (ii) linked to a guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide; or (iii) hybridized to a guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide. The composition of claim 22, further comprising a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Cast 2 polypeptide, wherein

(a) said guide polynucleotide further comprises a hybridization domain configured to hybridize to said DNA site at a 3' end; and

(b) wherein said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA. The composition of claim 24, wherein said insert DNA molecule comprises a region with complementarity to a region 5' to said double-stranded DNA site at said 5' end of said insert DNA. The composition of any one of claims 19-22, or 24-25, wherein said insert DNA molecule is linked to a catalytic hydroxyl group of said domain having DNA topoisomerase activity at a first end, and wherein said insert DNA molecule comprises said region homologous to a region 5' to said nucleic acid site or said region homologous to a region 3' to said nucleic acid site at a second end. The composition of any one of claims 1-25, further comprising a linker between said first fragment and said heterologous domain, or between said heterologous domain and said second fragment. The composition of claim 26, wherein said linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof. The composition of claim 26, wherein said linker comprises LPXTG, GGG, (GGG)n, (GGGGS)n, (GGGS)n, N1.7, a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/Spy Catcher sequences linked by an isopeptide bond. A composition comprising a fusion protein comprising:

(a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site;

(b) a second segment comprising either:

(i) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (ii) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and

(c) a third segment comprising a heterologous domain of at least about 100 amino acids. The composition of claim 30, wherein said first segment further comprises WED, RECI, RuvC, REC2, and Nuc domains of said first Cast 2 polypeptide. The composition of claim 30 or 31, wherein said Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide. The composition of any one of claims 30-32, wherein said heterologous domain comprises at least about 100-900 amino acids in length. The composition of any one of claims 30-33, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. The composition of any one of claims 30-34, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity. The composition of claim 35, wherein said domain with DNA-dependent DNA polymerase activity or said domain with Topoisomerase activity do not comprise inactivating mutations in an active site residue. The composition of claim 35 or 36, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a T7 DNA polymerase, Bst polymerase or an analog thereof, T4 DNA polymerase, Taq polymerase, Vent polymerase, Q5 polymerase, KI enow fragment, DNA polymerase theta, or Phi29 polymerase, or a functional fragment or derivative thereof. The composition of any one of claims 30-37, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 44-52, or a variant thereof. The composition of any one of claims 30-35, wherein said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type I topoisomerase domain. The composition of claim 39, wherein said Type I topoisomerase domain comprises E. coll Eubacterial DNA topoisomerase I, E. coli Eubacterial DNA topoisomerase III, S. cerevisiae Yeast DNA topoisomerase HU, H. sapiens DNA topoisomerase Illa or 111[l, S. acidocaldarius eubacterial and archaeal reverse DNA gyrase, M. kandleri eubacterial reverse gyrase, H. sapiens eukaryotic DNA topoisomerase I, Vaccinia poxvirus topoisomerase I, or M. kandleri hyperthermophilic eubacterial DNA topoisomerase V, phi XI 74 protein A, or a functional fragment thereof. The composition of any one of claims 30-36, wherein said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type II topoisomerase domain. The composition of claim 41, wherein said type II topoisomerase domain comprises E. coll eubacterial DNA gyrase, E. coll eubacterial DNA topoisomerase IV, S. cerevisiae yeast DNA topoisomerase II, H. sapiens mammalian DNA topoisomerase Ila or I IQ, or S. shibatae archaeal DNA topoisomerase VI, or a functional fragment thereof. The composition of any one of claims 30-36, wherein said heterologous domain comprises a domain with topoisomerase activity and said heterologous domain comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 53-55. The composition of any one of claims 30-43, wherein said first Casl2 polypeptide and said second Casl2 polypeptide are a same Casl2 polypeptide. The composition of any one of claims 30-43, wherein said first Cast 2 polypeptide and said second Casl2 polypeptide are different Casl2 polypeptides. The composition of any one of claims 30-45, wherein said first Casl2 polypeptide and said second Casl2 polypeptide do not comprise an inactivating mutation in an active site residue of said first Cast 2 polypeptide or said second Cast 2 polypeptide. The composition of any one of claims 30-45, wherein said first fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 1, 5, 13, 15, or 24-34, or a variant thereof. The composition of any one of claims 30-45, wherein said second fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 2, 6, 11, or 35-43. The composition of any one of claims 30-48, further comprising an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site. The composition of claim 49, wherein said region with complementarity to a region 5' to said nucleic acid site or said region with complementarity to a region 3' to said nucleic acid site comprises at least 4 to 30 bp or at least 4 to 400 bp. The composition of claims 49 or 50, wherein said insert nucleic acid sequence comprises at least about Ibp to at least about 20 kb. The composition of any one of claims 49-51, wherein said insert DNA molecule is a singlestranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. The composition of any one of claims 49-52, wherein said insert DNA molecule is: (i) linked to said first or said second Cast 2 polypeptide; (ii) linked to a guide polynucleotide

-SO- configured to interact with said first or said second Cast 2 polypeptide; or (iii) hybridized to a guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide. The composition of claim 52, wherein said composition further comprises a guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide and configured to hybridize to said DNA site, wherein

(a) said guide polynucleotide further comprises a hybridization domain at a 3' end; and

(b) wherein said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA. The composition of claim 54, wherein said insert DNA molecule comprises a region with complementarity to a region 5' to said double-stranded DNA site at said 5' end of said insert DNA. The composition of any one of claims 30-52, and 54-55, wherein said insert DNA molecule is linked to a catalytic hydroxyl group of said domain having DNA topoisomerase activity at a first end, and wherein said insert DNA molecule comprises said region homologous to a region 5' to said nucleic acid site or said region homologous to a region 3' to said nucleic acid site at a second end. The composition of any one of claims 30-55, further comprising a linker between said first fragment and said heterologous domain, or between said heterologous domain and said second fragment. The composition of claim 56, wherein said linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof. The composition of claim 56, wherein said linker comprises LPXTG, GGG, (GGG)n, (GGGGS)n, (GGGS)n, N1.7, a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/SpyCatcher sequences linked by an isopeptide bond. A fusion protein comprising a sequence having at least 80% identity to any one of SEQ ID Nos: 20-23. A method of editing a nucleic acid site in a cell, comprising contacting to said cell said composition of any one of claims 1-60. The method of claim 61, wherein said cell is a bacterial, archaeal, plant, mammalian, primate, or human cell. A method of editing a double-stranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to said site (i) a fusion protein comprising:

(a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide;

(b) a heterologous domain comprising at least about 100 amino acids; and

(c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein said first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site;

(ii) an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site; and

(iii) a guide polynucleotide configured to interact with said first Casl2 polypeptide or said second Casl2 polypeptide and configured to hybridize to said DNA site.

64. The method of claim 63, wherein said second fragment further comprises a REC2 domain of said second Cast 2 polypeptide.

65. The method of claim 63 or 64, wherein said first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide.

66. The method of any one of claims 63-65, wherein said heterologous domain comprises at least about 100-1500 amino acids in length.

67. The method of claim 66, wherein said heterologous domain comprises a domain with DNA- dependent DNA polymerase activity or a domain with Topoisomerase activity.

68. The method of any one of claims 63-67, wherein said first Cast 2 polypeptide and said second Casl2 polypeptide comprise a same Casl2 polypeptide.

69. The method of any one of claims 63-68, wherein said first Casl2 polypeptide and said second Casl2 polypeptide comprise different Casl2 polypeptides.

70. The method of any one of claims 63-69, wherein said insert DNA molecule is a singlestranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.

71. The method of any one of claims 63-70, wherein said insert DNA molecule is: (i) linked to said first or said second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide.

72. The method of claim 71, wherein: (a) said guide polynucleotide further comprises a hybridization domain configured to hybridize to said DNA site at a 3' end; and (b) said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA.

73. A method of editing a double-stranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to said site

(i) a fusion protein comprising:

(a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site;

(b) a second segment comprising either:

(A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or

(B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and

(c) a third segment comprising a heterologous domain of at least about 100 amino acids;

(ii) an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site; and

(iii) a guide polynucleotide configured to interact with said first Casl2 polypeptide or said second Casl2 polypeptide and configured to hybridize to said DNA site.

74. The method of claim 73, wherein said Casl2 polypeptide, said first Casl2 polypeptide, or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide.

75. The method of claim 73 or 74, wherein said heterologous domain comprises at least about 100-1500 amino acids in length.

76. The method of any one of claims 73-75, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity.

77. The method of any one of claims 73-76, wherein said insert DNA molecule is a singlestranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.

78. The method of any one of claims 73-77, wherein said insert DNA molecule is: (i) linked to said first or said second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide.

79. The method of claim 78, wherein: (a) said guide polynucleotide further comprises a hybridization domain configured to hybridize to said DNA site at a 3' end; and (b) said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA.

80. A kit for disrupting a DNA site, comprising

(i) a fusion protein comprising:

(a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide;

(b) a heterologous domain comprising at least about 100 amino acids; and

(c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein said first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site; and

(ii) a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Casl2 polypeptide and configured to hybridize to said DNA site.

81. The kit of claim 80, further comprising (iii) an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site.

82. The kit of claim 80, wherein said second fragment further comprises a REC2 domain of said second Casl2 polypeptide.

83. The kit of claim 80 or 81, wherein said first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide.

84. The kit of any one of claims 80-83, wherein said heterologous domain comprises at least about 100-1500 amino acids in length.

85. The kit of any one of claims 80-84, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity.

86. The kit of any one of claims 81-85, wherein said insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.

87. The kit of any one of claims 81-86, wherein said insert DNA molecule is: (i) linked to said first or said second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide.

88. The kit of claim 87, wherein: (a) said guide polynucleotide further comprises a hybridization domain configured to hybridize to said DNA site at a 3' end; and (b) said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA.

89. The kit of any one of claims 80-88, further comprising a transfection agent.

90. The kit of any one of claims 80-89, further comprising instructions for targeting said DNA site.

91. A kit for disrupting a DNA site, comprising

(i) a fusion protein comprising:

(a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site;

(b) a second segment comprising either:

(A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or

(B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide;

(ii) a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Casl2 polypeptide and configured to hybridize to said DNA site.

92. The kit of claim 91, further comprising (iii) an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site.

93. The kit of claim 91 or 92, wherein said second fragment further comprises a REC2 domain of said second Cast 2 polypeptide.

94. The kit of any one of claims 91-93, said first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide.

95. The kit of any one of claims 91-94, wherein said heterologous domain comprises at least about 100-1500 amino acids in length.

96. The kit of any one of claims 91-95, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity.

97. The kit of any one of claims 92-96, wherein said insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.

98. The kit of any one of claims 92-97, wherein said insert DNA molecule is: (i) linked to said first or said second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide. The kit of claim 98, wherein: (a) said guide polynucleotide further comprises a hybridization domain configured to hybridize to said DNA site at a 3' end; and (b) said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA. . The kit of any one of claims 91-99, further comprising a transfection agent. . The kit of any one of claims 91-100, further comprising instructions for targeting said DNA site.

Description:
POLYPEPTIDE FUSIONS OR CONJUGATES FOR GENE EDITING

CROSS-REFERENCE

[0001] This application claims the benefit of U.S. Provisional Application No. 63/380,047 filed October 18, 2022, entitled “POLYPEPTIDE FUSIONS OR CONJUGATES FOR GENE EDITING”, which application is incorporated by reference herein in its entirety.

BACKGROUND

[0002] Programmable nucleases such as CRISPR-associated Cas endonucleases have revolutionized the ability to perform gene editing in organisms in a precise, site-directed manner.

SEQUENCE LISTING

[0003] The instant application contains a Sequence Listing which has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. Said XML copy, created on October 5, 2023, is named 59409-70260 l_Seq_Listing.xml and is 136,990 bytes in size.

SUMMARY

[0004] In some aspects, the present disclosure provides for a composition comprising a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Cas 12 polypeptide, wherein said first and second Cas 12 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site.

[0005] In some aspects, the present disclosure provides fora composition comprising a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (i) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (ii) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and (c) a third segment comprising a heterologous domain of at least about 100 amino acids.

[0006] In some aspects, the present disclosure provides for a fusion protein having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity identity to any of the polypeptides described herein.

[0007] In some aspects, the present disclosure provides for a method of editing a nucleic acid site in a cell, comprising contacting said cell with any of the compositions described herein. [0008] In some aspects, the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to said site (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein said first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site; and (iii) a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Cast 2 polypeptide and configured to hybridize to said DNA site.

[0009] In some aspects, the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to said site (i) a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and (c) a third segment comprising a heterologous domain of at least about 100 amino acids; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to said doublestranded DNA site or a region with complementarity to a region 3' to said nucleic acid site; and (iii) a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Cast 2 polypeptide and configured to hybridize to said DNA site.

[0010] In some aspects, the present disclosure provides for a kit for disrupting a DNA site, comprising (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Cast 2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Cast 2 polypeptide, wherein said first and second Cast 2 polypeptide are configured to bind a doublestranded deoxyribonucleic acid (DNA) site; and (ii) a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Cast 2 polypeptide and configured to hybridize to said DNA site. [0011] In some aspects, the present disclosure provides for a kit for disrupting a DNA site, comprising (i) a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and (ii) a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Cast 2 polypeptide and configured to hybridize to said DNA site. [0012] In some aspects, the present disclosure provides for a composition comprising a fusion protein comprising: (a) a first fragment comprising WED and RECI domains derived from a Casl2 enzyme; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains from a Cast 2 enzyme, wherein said Casl2 enzyme is configured to bind a double-stranded deoxyribonucleic acid (DNA) site. In some embodiments, said second fragment further comprises a REC2 domain derived from a Casl2 enzyme. In some embodiments, said Casl2 enzyme is a Class 2, Type V-F or a Casl2f enzyme. In some embodiments, said heterologous domain comprises at least about 10, 25, 50, 75, or 100 to about 1500 amino acids in length. In some embodiments, said heterologous domain comprises at least about 876 amino acids or at least about 900 amino acids. In some embodiments, said heterologous domain comprises at most about 10, 25, 50, 75, 100, 250, 500. 750, 1000, 1200, 1500, 1700, 2000, 2200, 2500, 2700, or 3000 amino acids in length. In some embodiments, said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, said domain with DNA-dependent DNA polymerase activity or said domain with Topoisomerase activity do not comprise inactivating mutations in an active site residue. In some embodiments, said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a T7 DNA polymerase, Bst polymerase or an analog thereof, T4 DNA polymerase, Taq polymerase, Vent polymerase, Q5 polymerase, Klenow fragment, DNA polymerase theta, or Phi29 polymerase, or a functional fragment or derivative thereof. In some embodiments, said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 44-52, or a variant thereof. In some embodiments, said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type I topoisomerase domain. In some embodiments, said Type I topoisomerase domain comprises E. coll Eubacterial DNA topoisomerase I, E. coli Eubacterial DNA topoisomerase III, S. cerevisiae Yeast DNA topoisomerase IIII, H. sapiens DNA topoisomerase Illa or 11 ip, S. acidocaldarius eubacterial and archaeal reverse DNA gyrase, M. kandleri eubacterial reverse gyrase, H. sapiens eukaryotic DNA topoisomerase I, Vaccinia poxvirus topoisomerase I, or M. kandleri hyperthermophilic eubacterial DNA topoisomerase V, or a functional fragment thereof. In some embodiments, said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type II topoisomerase domain. In some embodiments, said type II topoisomerase domain comprises E. coli eubacterial DNA gyrase, E. coli eubacterial DNA topoisomerase IV, S. cerevisiae yeast DNA topoisomerase II, H. sapiens mammalian DNA topoisomerase Ila or IIJ3, or S. shibatae archaeal DNA topoisomerase VI, or a functional fragment thereof. In some embodiments, said heterologous domain comprises a domain with topoisomerase activity and said heterologous domain comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 53-55. In some embodiments, said first fragment and said second fragment are derived from a same Cast 2 enzyme. In some embodiments, said first fragment and said second fragment are derived from a different Cast 2 enzyme. In some embodiments, said first fragment and said second fragment do not comprise an inactivating mutation in an active site residue of said Cast 2 enzyme. In some embodiments, said first fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 1, 5, 13, 15, or 24-34, or a variant thereof. In some embodiments, said second fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 2, 6, 11, or 35-43. In some embodiments, said composition further comprises an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site. In some embodiments, said region with complementarity to a region 5' to said nucleic acid site or said region with complementarity to a region 3' to said nucleic acid site comprises at least 4 to 30 bp or at least 4 to 400 bp. In some embodiments, said insert nucleic acid sequence comprises at least about Ibp to at least about 20 kb. In some embodiments, said insert nucleic acid sequence comprises at least about 100 bp, 250 bp, 500 bp, 750 bp, 1 kb, 1.2 kb, 1.5 kb, 1.7 kb, 2.0 kb, 2.5 kb, 3 kb, 3.5 kb, 4 kb, 4.5 kb, 5 kb, 6 kb, 6.5 kb, 7 kb, 7.5 kb, 8 kb, 8.5 kb, 9 kb, 9.5 kb, 10 kb, 11 kb, 12 kb, 13 kb, 14 kb, 15 kb, 16 kb, 17 kb, 18 kb, 19 kb, 20 kb, or any range between these values. In some embodiments, said insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single- stranded deoxyribonucleic acid molecule, or at least partially a doublestranded deoxyribonucleic acid molecule. In some embodiments, said insert DNA molecule is: (i) linked to said programmable nuclease; (ii) linked to a guide polynucleotide configured to interact with a Cas endonuclease, wherein said programmable nuclease is a Cas enzyme; or (iii) hybridized to a guide polynucleotide configured to interact with a Cas endonuclease, wherein said programmable nuclease is a Cas enzyme. In some embodiments, said composition further comprises a guide polynucleotide configured to interact with said Cas protein, wherein (a) said guide polynucleotide further comprises a hybridization domain at a 3' end; and (b) wherein said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA. In some embodiments, said insert DNA molecule comprises a region with complementarity to a region 5' to said double-stranded DNA site at said 5' end of said insert DNA. In some embodiments, said insert DNA molecule is linked to a catalytic hydroxyl group of said domain having DNA topoisomerase activity at a first end, and wherein said insert DNA molecule comprises said region homologous to a region 5' to said nucleic acid site or said region homologous to a region 3' to said nucleic acid site at a second end. In some embodiments, said composition further comprises a linker between said first fragment and said heterologous domain, or between said heterologous domain and said second fragment. In some embodiments, said linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof. In some embodiments, said linker comprises LPXTG (SEQ ID NO: 79), GGG (SEQ ID NO: 80), (GGG) n (SEQ ID NO: 81), (GGGGS)n(SEQ ID NO: 82), (GGGS)n(SEQ ID NO: 83), NI- 7 , a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/SpyCatcher sequences linked by an isopeptide bond.

[0013] In some aspects, the present disclosure provides for a composition comprising a fusion protein comprising: (a) a first segment comprising a Casl2 enzyme configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (i) a sequence comprising WED and RECI domains derived from a Casl2 enzyme; or (ii) a sequence comprising RuvC, REC2, and Nuc domains from a Casl2 enzyme; and (c) a third segment comprising a heterologous domain of at least about 10, 25, 50, 75, or 100 amino acids. In some embodiments, said first domain further comprises WED, RECI, RuvC, REC2, and Nuc domains from said Casl2 In some embodiments, said heterologous domain comprises at least about 10, 25, 50, 75, or 100 to about 900 amino acids in length. In some embodiments, said heterologous domain comprises at least about 100 to about 1500 amino acids in length. In some embodiments, said heterologous domain comprises at least about 876 amino acids or at least about 900 amino acids. In some embodiments, said heterologous domain comprises at most about 10, 25, 50, 75, 100, 250, 500, 750, 1000, 1200, 1500, 1700, 2000, 2200, 2500, 2700, or 3000 amino acids in length. In some embodiments, said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, said heterologous domain comprises a domain with DNA- dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, said domain with DNA-dependent DNA polymerase activity or said domain with Topoisomerase activity do not comprise inactivating mutations in an active site residue. In some embodiments, said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a T7 DNA polymerase, Bst polymerase or an analog thereof, T4 DNA polymerase, Taq polymerase, Vent polymerase, Q5 polymerase, Klenow fragment, DNA polymerase theta, or Phi29 polymerase, or a functional fragment or derivative thereof. In some embodiments, said heterologous domain comprises a domain with DNA- dependent DNA polymerase activity, and comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 44-52, or a variant thereof. In some embodiments, said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type I topoisomerase domain. In some embodiments, said Type I topoisomerase domain comprises E. coli Eubacterial DNA topoisomerase I, E. coli Eubacterial DNA topoisomerase III, S. cerevisiae Yeast DNA topoisomerase IIII, H. sapiens DNA topoisomerase Illa or 111[l, S. acidocaldarius eubacterial and archaeal reverse DNA gyrase, M. kandleri eubacterial reverse gyrase, H. sapiens eukaryotic DNA topoisomerase I, Vaccinia poxvirus topoisomerase I, or AT. kandleri hyperthermophilic eubacterial DNA topoisomerase V, or a functional fragment thereof. In some embodiments, said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type II topoisomerase domain. In some embodiments, said type II topoisomerase domain comprises E. coli eubacterial DNA gyrase, E. coli eubacterial DNA topoisomerase IV, S. cerevisiae yeast DNA topoisomerase II, H. sapiens mammalian DNA topoisomerase Ila or IIJ3, or S. shibatae archaeal DNA topoisomerase VI, or a functional fragment thereof. In some embodiments, said heterologous domain comprises a domain with topoisomerase activity and said heterologous domain comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 53-55, or a variant thereof. In some embodiments, said first fragment and said second fragment are derived from a same Cast 2 enzyme. In some embodiments, said first fragment and said second fragment are derived from a different Casl2 enzyme. In some embodiments, said first fragment and said second fragment do not comprise an inactivating mutation in an active site residue of said Casl2 enzyme. In some embodiments, said first fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 1, 5, 13, 15, or 24-34, or a variant thereof. In some embodiments, said second fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 2, 6, 11, or 35-43, or a variant thereof. In some embodiments, the composition further comprises an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site. In some embodiments, said region with complementarity to a region 5' to said nucleic acid site or said region with complementarity to a region 3' to said nucleic acid site comprises at least 4 to 30 bp or at least 4 to 400 bp. In some embodiments, said insert nucleic acid sequence comprises at least about Ibp to at least about 20 kb. In some embodiments, said insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a singlestranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, said insert DNA molecule is: (i) linked to said programmable nuclease; (ii) linked to a guide polynucleotide configured to interact with a Cas endonuclease, wherein said programmable nuclease is a Cas enzyme; or (iii) hybridized to a guide polynucleotide configured to interact with a Cas endonuclease, wherein said programmable nuclease is a Cas enzyme. In some embodiments, said composition further comprises a guide polynucleotide configured to interact with said Cas protein, wherein (a) said guide polynucleotide further comprises a hybridization domain at a 3' end; and (b) wherein said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA. In some embodiments, said insert DNA molecule comprises a region with complementarity to a region 5' to said double-stranded DNA site at said 5' end of said insert DNA. In some embodiments, said insert DNA molecule is linked to a catalytic hydroxyl group of said domain having DNA topoisomerase activity at a first end, and wherein said insert DNA molecule comprises said region homologous to a region 5' to said nucleic acid site or said region homologous to a region 3' to said nucleic acid site at a second end. In some embodiments, the composition further comprises a linker between said first fragment and said heterologous domain, or between said heterologous domain and said second fragment. In some embodiments, said linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof. In some embodiments, said linker comprises LPXTG, GGG, (GGG) n , (GGGGS) n , (GGGS) n , N1.7, a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/SpyCatcher sequences linked by an isopeptide bond.

[0014] In some aspects, the present disclosure provides for a fusion protein comprising a sequence having at least 80% identity to any one of SEQ ID NOs: 20-23, or a variant thereof. [0015] In some aspects, the present disclosure provides for a system comprising any of the components of the compositions described herein.

[0016] In some aspects, the present disclosure provides for a method of editing a DNA locus in a cell comprising contacting to a cell any of the compositions described herein, or a nucleic acid comprising or encoding any of the compositions described herein. In some embodiments, the cell is a bacterial, archaeal, plant, mammalian, primate, or human cell.

[0017] Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

INCORPORATION BY REFERENCE

[0018] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

[0019] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:

[0020] Figure 1 (FIG. 1) depicts a schematic representation of split Casl2f functional domains. Examples are based on two casl2f proteins: Sp casl2f and Pt casl2f. It was considered that fA and fB can be integrated with different intervening heterologous domains to add new enzymatic activity while simultaneously preserving specific guided DNA cleavage activity. Abbreviations: Nuclease lobe (NUC), Wedge (WED), Recognition lobe RECI and REC2, Nuclease (Lid RuvC).

[0021] Figure 2 (FIG. 2) depicts examples of split Casl2f proteins incorporating various heterologous polymerase domains. The Casl2f (Sp casl2f and Pt casl2f) functional domains were extracted and integrated with heterologous sequences (Central bar— DNA polymerase Bst or Phi29) such that the Casl2f components retain structural/functional integrity. The polymerase function is strategically located in the structure such that it can best support activity of the polymerases on DNA.

[0022] Figure 3 (FIG. 3) depicts examples of split Casl2f proteins incorporating SpyCatcher domains to allow linkage to an additional protein in vitro.

[0023] Figures 4A and 4B (FIGs. 4A and 4B) depict examples of an alternative anchor/connector design incorporating one functional spCasl2f domain and an additional Casl2f fragment to allow for anchoring of the heterologous domain at the proper place in the complex. In this embodiment, the natural dimer structure of Casl2f is used to provide a scaffold. The leftmost fragment is a full Sp casl2f monomer, the central domains include SpyCatcher (which serves here as a linker but also can be used to attach different heterologous polypeptides). The rightmost domain is a structural monomer used in a minimal truncated form where the role is limited role of anchor/connector of an accessory protein (grey box, e.g. DNA polymerase or Topoisomerase). In some embodiments, SpyCatcher can be replaced with linker sequences.

[0024] Figures 5A and 5B (FIGs. 5A and 5B) depict expression/purification and activity assays for Sp and Pt casl2f and derivatives constructs. These assays illustrate that domains extracted from Casl2f and integrated with SpyCatcher retain specific guided cleavage activity. [0025] Figure 5A (FIG. 5A) depicts expression and activity assays for a first batch of split casl2f constructs. The top panel shows SDS-PAGE of the purification products in Example 1. The analyses verified production of all variants described in the example. Lane order: 1- Pt casl2f N-term SpyTag, 2- Sp casl2f N-term SpyTag, 3- Pt casl2f C-term SpyTag, 4- Sp casl2f C-term SpyTag, 5- Pt integral SpyCatcher, 6- Sp integral SpyCatcher. The bottom panel shows an agarose gel depicting results of the in vitro guided cleavage assay in Example 1. The substrate and cleavage products are annotated. The assay demonstrates that all the variants retain wild-type-like activity. Lane order: 1- Sp casl2f, 2- Pt casl2f, 3- Sp integral SpyCatcher, 4- Pt integral SpyCatcher, 5-Pt casl2f C-term SpyTag, 6- 1-Pt casl2f N-term SpyTag, M- molecular weight marker.

[0026] Figure 5B (FIG. 5B) depicts expression and activity assays for a second batch of split casl2f constructs. The top panel shows SDS-PAGE of the purification products in Example 1. The analyses shows good yield for all variants described in the example. Lane order: 1- Sp integral BstPol, 2- Pt integral BstPol, M- molecular weight marker. The bottom panel shows an agarose gel depicting results of the in vitro guided cleavage assay in Example 1. The substrate and cleavage products are annotated. The assay demonstrates that all the variants retain wild-type-like activity. Lane order: 1- Sp casl2f, 2-Pt casl2f, 3- Sp integral BstPol, 4- Pt integral BstPol, 5- Sp target DNA, M - molecular weight marker.

[0027] Figure 6 (FIG. 6) shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs can facilitate integration of genomic DNA; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to the ends of the DNA insert. The endonuclease catalyzes DNA damage at a specified site to induce DNA repair. The DNA insert includes homology fragments which are used in homology recombination/repair. The DNA insert in this case is presented by two Cas polypeptides which specifically recognize and bind target DNA. [0028] Figure 7 (FIG. 7) shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs can facilitate integration of genomic DNA; in this embodiment, one end of the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA, while the other end is free. In this model, the endonuclease catalyzes DNA damage at a specified site to induce DNA repair incorporating the DNA insert. The DNA insert includes homology fragments which are used in homology recombination/repair. In this case, the DNA insert is a presented by a single Cas polypeptide (e.g. enzyme) that specifically recognizes and binds target DNA.

[0029] Figure 8 (FIG. 8) shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl 2f endonuclease) constructs with topoisomerase or protein A integrated between the halves of the Casl2 can facilitate integration of genomic DNA; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to the ends of the DNA insert. The topoisomerase or Protein A catalyzes cleavage of target DNA and simultaneously covalently attaches to one of the strands. Damage to DNA target induces DNA repair. The DNA insert includes homology fragments which are used in homology recombination/repair. [0030] Figure 9 (FIG. 9) shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs according to the current disclosure with topoisomerase or protein A integrated between the halves of the Cas 12 can facilitate integration of genomic DNA; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to one end of the DNA insert, while the other end of the DNA insert is free. The topoisomerase or Protein A catalyzes cleavage of target DNA simultaneously and covalently attaches to one of the strand. Damage to DNA target induces DNA repair. The DNA insert includes homology fragments which are used in homology recombination/repair. In this case, the DNA insert is presented by single Cas-topoisomerases.

[0031] Figure 10 (FIG. 10) shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs with topoisomerase or Protein A and exonuclease integrated in the constructs; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to the ends of the DNA insert. The topoisomerase or Protein A catalyzes cleavage of target DNA and simultaneously covalently attaches to one of the strands. To make the topoisomerase reaction irreversible a second accessory protein a exonuclease damages the ssDNA in the vicinity of the cut. Damage to the target DNA induces DNA repair. The DNA insert includes homology fragments which are used in homology recombination/repair. In this case, the DNA insert is a presented by two Cas-topoisom erases which are specifically covalently attached to target DNA.

[0032] Figure 11 (FIG. 11) shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs with topoisomerase or Protein A and exonuclease integrated in the constructs; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to one end of the DNA insert, while the other end of the DNA insert is free. The topoisomerase or Protein A catalyzes cleavage of target DNA and simultaneously covalently attaches to one of the strands. Damage to DNA target induces DNA repair. To make the topoisomerase reaction irreversible a second accessory protein a exonuclease damages the ssDNA in the vicinity of the cut. The DNA insert includes homology fragments which are used in homology recombination/repair. In this case, the DNA insert is a presented by single a Cas that specifically recognizes and binds target DNA.

[0033] Figures 12A and 12B (FIGs. 12A and 12B) illustrate various DNA insert formats that can be used with Cas (e.g. Casl2f) polypeptides (e.g. enzymes) to facilitate homologous recombination at a target site.

[0034] Figure 12A shows several diagrams of attachment of gRNAs to DNA inserts. In panel (A), double Cas polypeptides are attached to the insert with linear dsDNA hybridization; the insert includes target homology sequence required for the editing on either end. In panel (B), a single cas is attached with linear dsDNA hybridization via one side of the insert, while the other side is protected from exonucleolytic degradation with modified nucleotides (e.g. using nucleotide modifications such as C3 '-spacer, hexanediol, l',2'-Dideoxyribose/dSpacer, PC Spacer, Spacer 9, or Spacer 18, or bond modifications such as phosphorothioate bonds). In panel (C), a single cas polypeptide is attached to dumbbell dsDNA containing an insert: one side of the insert is attached to Cas via oligo hybridization, while the other side lacks a hybridization site and simply has a dumbbell loop. Like in the case of the linear dsDNA insert in panels (A) and (B), the dumbbell includes two target homology sequences required to complete homologous recombination. In panel (D), double cas polypeptides are attached to dumbbell dsDNA containing an insert analogous to panel (C) but having a second hybridization site for an oligo that binds a gRNA on both sides of the dumbbell. In panel (E), a single cas is attached to dumbbell dsDNA containing an insert, but via a hybridization site internal to the homology regions.

[0035] Figure 12B shows an example dumbbell sequence (5'- GAAAGGAAGCCCTGCTTCCTCCAGAGGGCGTCGCAGGACAGCTTTTCCTAGACAG GGGCTAGTATGTGCAtttcctgatgtcgatgtgCCAGGAGAGGAGGGAGAAATCCCTCCT CTC CTGGcacatcgacatcaggaaaTGCACATACTAGCCCCTGTCTAGGAAAAGCTGTCCTGC GAC GCCCTCTGGAGGAAGCAGGGCTTCCTTTCGTCAGTCAGTCAGTCAGTCAGTCAGT C AGTC AGTC AGTC AGTC AGTC A-3', SEQ ID NO: 74) that can be used in the embodiments described in Figure 12A, where underlined sequences (5'- GAAAGGAAGCCCTGCTTCCTCCAGAGGGCGTCGCAGGACAGCTTTTCCTAGACAGG GGCTAGTATGTGCAtttcctgatgtcgatgtgCCAGGAGAGGAGGGA-3' and 5'- TCCCTCCTCTCCTGGcacatcgacatcaggaaaTGCACATACTAGCCCCTGTCTAGGAAA AGC TGTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTC-3') denote stem parts of the dumbbell and italicized (5'-GAAA-3' and 5'-

GTCAGTCAGTCAGTC AGTC AGTC AGTC AGTCAGTCAGTCAGTCAGTCA-3 ') letters denote single strand parts of the dumbbell. The last italicized sequence (5'-

GTCAGTC AGTC AGTC AGTC AGTC AGTC AGTCAGTCAGTCAGTCAGTCA-3') can be used as a gRNA hybridization site. Such a dumbbell can be constructed by ligating the adaptor sequences (bolded) using e.g. T4 DNA ligase.

[0036] Figure 13 (FIG. 13) shows a schematic representation of an editing design including Cas, DNA polymerase and DNA inserts according to the current disclosure. The schematic emphasizes three possible paths after specific target recognition, cleavage, and annealing of the 3'- ssDNA tail to the cleaved and displaced target strand: (A) a path where the displaced and cleavage by Cas target DNA serves as a primer and DNA insert is a template; (B) a path where the insert’s 3'- ssDNA tail serves as primer and cleaved target DNA serves as a template; (C) a path similar to path B but wherein the target DNA is not cleaved. The paths generate stable attachment of the insert to target DNA and involve the host DNA repair mechanism to finalize editing.

[0037] Figure 14A (FIG. 14A) shows a schematic explaining a method to incorporate long insert sequences where the insert have high homology to the target DNA. Shown are two methods with accessory 3' to 5' exonuclease and 5' to 3' exonuclease (e.g. where exonucleases are incorporated as a heterologous polypeptide between or in addition to split Casl2f domains). One of the strands of the cleaved DNA target is digested by 3' to 5' exonuclease or 5' to 3 'exonuclease. The digestion process competes with Host DNA repair mechanism (homologous recombination); as a result, the homology cross-point is moved away from the target site, increasing insert length.

[0038] Figure 14B (FIG. 14B) and Figure 14C (FIG. 14C) depict schematics showing how the designs in Figure 14A can be implemented using Cas/polymerase/exonuclease fusions. [0039] Figures 15A, 15B, and 15C (FIGs. 15A, 15B, and 15C) depict polypeptide diagrams showing how Cas/polymerase/exonuclease fusions as shown in Figures 14B and 14C can be constructed at a domain level. FIG. 15A depicts examples using split Casl2f and T5 exonuclease or exo III; FIG. 15B depicts examples using intact Casl2f and T5 exonuclease or ExoIII, and FIG. 15C depicts examples using Cas9 or Casl2 alongside Bst Polymerase and T5 exonuclease or Exo III. The left and rightmost domains represents Casl2f Sp or Pt, the domains flanking the central domains are linkers and the central domain is exonuclease T5Exo or ExoIII. [0040] Figure 16 (FIG. 16) depicts an example hybrid gRNA/DNA insert usable with methods according to the disclosure. In this example, gRNA and insert DNA are covalently attached either by standard 5'-phosphate-3' bond or an alternative covalent conjugation method (e.g. chemical synthesis, splint ligation of RNA using T4 DNA ligase and a bridging DNA oligonucleotide complementary to an RNA, or any of the methods described in Huang et al. Nucleic Acids Research, Volume 24, Issue 21, 1 November 1996, Pages 4360-4361, which is incorporated by reference in its entirety herein, or Mack et al. Curr Protoc Chem Biol. 2016; 8(2): 83-95, which is incorporated by reference in its entirety herein) . For example, such hybrids can be constructed by ligation of gRNA to a DNA insert, chemical synthesis, or primer extension starting from a gRNA primer using DNA Pol I or Klenow fragment.

DETAILED DESCRIPTION

[0041] There is a need for endonuclease compositions, methods, and systems that improve the efficiency of transgene insertions into precise locations for genomic editing. Insertion efficiencies of transgenes using CRISPR-Cas relying on simple homologous recombination can be in the single digits at for large inserts, making approaches relying on such methods technically laborious. Provided herein are methods, compositions, and systems for improved gene editing, particularly involving large insert DNAs.

Definitions

[0042] The practice of some methods disclosed herein employ, unless otherwise indicated, techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DNA. See for example Sambrook and Green, Molecular Cloning: A Laboratory Manual, 4th Edition (2012); the series Current Protocols in Molecular Biology (F. M. Ausubel, et al. eds.); the series Methods In Enzymology (Academic Press, Inc.), PCR 2: A Practical Approach (M.J. MacPherson, B.D. Hames and G.R. Taylor eds. (1995)), Harlow and Lane, eds. (1988) Antibodies, A Laboratory Manual, and Culture of Animal Cells: A Manual of Basic Technique and Specialized Applications, 6th Edition (R.I. Freshney, ed. (2010)) (which are entirely incorporated by reference herein). [0043] As used herein, the term “programmable nuclease” generally refers to endonucleases that are “targeted” (“programed”) to recognize and edit a pre-determined site in a genome of an organism. In an embodiment, the programmable nuclease can induce site specific DNA cleavage at a pre-determined site in a genome. In an embodiment, the programmable nuclease may be programmed to recognize a genomic location with a DNA binding protein domain, or combination of DNA binding protein domains. Example features of Cas programmable nucleases or Cas polypeptides (e.g. enzymes) are described in, e.g. Makarova, et al. Nat. Rev. Microbiol., 18, 67-83, Shmakov et al. Nat. Rev. Microbiol., 15, 169-182, and Karvelis et al. Nucleic Acids Research, 2020, Vol. 48, No. 9, 5016-5023, each of which is incorporated by reference herein in their entireties. In some cases, the programmable nuclease is a Casl2 polypeptide, such as a Class 2, Type V-F polypeptide or a Casl2f polypeptide for which example domain organization, domain types (e.g. WED, RECI, RuvC, Nuc, and REC21 domains), functional residues, and structure (e.g. relative to SEQ ID NO: 84) are outlined in e.g. Xiao et al. Nucleic Acids Res. 2021 Apr 19; 49(7): 4120-4128, which is incorporated by reference herein for all purposes.

[0044] As used herein, a “guide nucleic acid” or “guide polynucleotide” generally refers to a nucleic acid that may hybridize to another nucleic acid. A guide nucleic acid may be RNA. A guide nucleic acid may be DNA. The guide nucleic acid may be programmed to bind specifically to a nucleic acid with a particular sequence. The nucleic acid to be targeted, or the target nucleic acid, may comprise nucleotides. The guide nucleic acid may comprise nucleotides. A portion of the target nucleic acid may be complementary to a portion of the guide nucleic acid. The strand of a double-stranded target polynucleotide that is complementary to and hybridizes with the guide nucleic acid may be called the complementary strand. The strand of the double-stranded target polynucleotide that is complementary to the complementary strand, and therefore may not be complementary to the guide nucleic acid may be called a noncomplementary strand. A guide nucleic acid may comprise a polynucleotide chain and can be called a “single guide nucleic acid.” A guide nucleic acid may comprise two polynucleotide chains and may be called a “double guide nucleic acid.” If not otherwise specified, the term “guide nucleic acid” may be inclusive, referring to both single guide nucleic acids and double guide nucleic acids. Guide nucleic acids may comprise a nucleic acid targeting segment (e.g. a crRNA) and a protein binding sequence. Guide nucleic acids may comprise a nucleic acid targeting segment (e.g. a crRNA) a protein binding sequence, and a trans-activating RNA (e.g. a tracrRNA).

[0045] A guide nucleic acid may comprise a segment that can be referred to as a “nucleic acidtargeting segment” a “nucleic acid-targeting sequence” or a “seed sequence”. In some cases, the sequence is 19-21 nucleotides in length. In some cases, “nucleic acid-targeting segment” or a “nucleic acid-targeting sequence” comprises a crRNA. A nucleic acid-targeting segment may comprise a sub-segment that may be referred to as a “protein binding segment” or “protein binding sequence” or “Cas protein binding segment”.

[0046] The term “guide RNA”, as used herein, can generally refer to a nucleic acid with at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% sequence identity and/or sequence similarity to a wild type example guide RNA sequence (e.g., a type V guide RNA from S. pyogenes, S. aureus, etc). Guide RNA can refer to a nucleic acid with at most about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% sequence identity and/or sequence similarity to a wild type example guide RNA sequence. Guide RNA may refer to a modified form of a guide RNA that can comprise a nucleotide change such as a deletion, insertion, or substitution, variant, mutation, or chimera. A guide RNA may refer to a nucleic acid that can be at least about 60% identical to a wild type example guide RNA sequence over a stretch of at least 6 contiguous nucleotides. For example, a guide RNA sequence can be at least about 60% identical, at least about 65% identical, at least about 70% identical, at least about 75% identical, at least about 80% identical, at least about 85% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical, or 100 % identical to a wild type example guide RNA sequence over a stretch of at least 6 contiguous nucleotides.

[0047] The term “sequence identity” or “percent identity” in the context of two or more nucleic acids or polypeptide sequences, generally refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence over a local or global comparison window, as measured using a sequence comparison algorithm. Suitable sequence comparison algorithms for polypeptide sequences include, e.g., BLASTP using parameters of a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of

1, and using a conditional compositional score matrix adjustment for polypeptide sequences longer than 30 residues; BLASTP using parameters of a wordlength (W) of 2, an expectation (E) of 1000000, and the PAM30 scoring matrix setting gap costs at 9 to open gaps and 1 to extend gaps for sequences of less than 30 residues (these are the default parameters for BLASTP in the BLAST suite available at https://blast.ncbi.nlm.nih.gov); CLUSTALW with parameters of ; the Smith -Waterman homology search algorithm with parameters of a match of

2, a mismatch of -1, and a gap of -1; MUSCLE with default parameters; MAFFT with parameters retree of 2 and maxiterations of 1000; Novafold with default parameters; HMMER hmmalign with default parameters.

[0048] The term “optimally aligned” in the context of two or more nucleic acids or polypeptide sequences, generally refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that have been aligned to maximal correspondence of amino acids residues or nucleotides, for example, as determined by the alignment producing a highest or “optimized” percent identity score.

[0049] As used herein, the term “Wedge” (WED) domain generally refers to a domain (e.g. present in a Cas protein) interacting primarily with repeat: anti-repeat duplex of the sgRNA and PAM duplex. A WED domain can generally be identified by alignment to documented domain sequences, structural alignment to proteins with annotated domains, or by comparison to Hidden Markov Models (HMMs) built based on documented domain sequences.

[0050] As used herein, the term “REC domain” generally refers to a domain (e.g. present in a Cas protein) comprising at least one of two segments (RECI or REC2) that are alpha helical domains thought to contact the guide RNA. A REC domain or segments thereof can generally be identified by alignment to documented domain sequences, structural alignment to proteins with annotated domains, or by comparison to Hidden Markov Models (HMMs) built based on documented domain sequences (e.g., Pfam PF19501 for domain RECI).

[0051] The term "pharmaceutically acceptable carrier" or “pharmaceutically acceptable excipient” as used herein generally refers to a diluent, adjuvant, excipient, or vehicle with which a probe of the disclosure is administered and which is approved by a regulatory agency of the Federal or a state government or listed in the U.S. Pharmacopeia or other generally recognized pharmacopeia for use in animals, and more particularly in humans. Such pharmaceutical carriers can be liquids, such as water and oils, including those of petroleum, animal, vegetable or synthetic origin, such as pea-nut oil, soybean oil, mineral oil, sesame oil and the like. The pharmaceutical carriers can be saline, gum acacia, gelatin, starch paste, talc, keratin, colloidal silica, urea, and the like. When administered to a patient, the probe and pharmaceutically acceptable carriers can be sterile. Water can be a useful carrier when the composition is administered intravenously. Saline solutions and aqueous dextrose and glycerol solutions can also be employed as liquid carriers, particularly for injectable solutions. Suitable pharmaceutical carriers also include excipients such as glucose, lactose, sucrose, glycerol monostearate, sodium chloride, glycerol, propylene, glycol, water, ethanol and the like. The present compositions, if desired, can also contain minor amounts of wetting or emulsifying agents, or pH buffering agents. The present compositions may take the form of solutions, emulsion, sustained-release formulations, or any other form suitable for use. In some cases the pharmaceutically acceptable excipient may comprise a transfection agent. Suitable transfection agents include, but are not limited to, linear or branched polyethylenimines (see e.g. Bonnet et al., (2008) Pharmaceut. Res. 25: 2972-2982, which is incorporated by reference herein in its entirety for all purposes), nanoparticles, lipid nanoparticles (LNPs, see e.g. in Finn et al. Cell Rep. 2018 Feb 27;22(9):2227-2235. Doi: 10.1016/j.celrep.2018.02.014 or Yin et al. Nat Biotechnol. 2016 Mar;34(3):328-33. Doi: 10.1038/nbt.3471, both of which are incorporated by reference herein in their entireties for all purposes), lipophilic particles, peptides, micelles, dendrimers, hydrogels, synthetic or naturally derived exosomes, polymeric composition, viruslike particles (see e.g. Lisziewicz et al., (2012) PLoS ONE 7:e35416, which is incorporated by reference herein in its entirety for all purposes), and any combination thereof.

[0052] Included in the current disclosure are variants of any of the enzymes, polypeptides, proteins, or domains described herein with one or more conservative amino acid substitutions. Such conservative substitutions can be made in the amino acid sequence of a polypeptide without disrupting the three-dimensional structure or function of the polypeptide. Conservative substitutions can be accomplished by substituting amino acids with similar hydrophobicity, polarity, and R chain length for one another. Additionally, or alternatively, by comparing aligned sequences of homologous proteins from different species, conservative substitutions can be identified by locating amino acid residues that have been mutated between species (e.g., nonconserved residues) without altering the basic functions of the encoded proteins. Such conservatively substituted variants may include variants with at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of the endonuclease protein sequences described herein. In some embodiments, such conservatively substituted variants are functional variants. Such functional variants can encompass sequences with substitutions such that the activity of one or more critical active site residues or guide polynucleotide binding residues of the endonuclease are not disrupted.

[0053] Also included in the current disclosure are variants of any of the enzymes, polypeptides, proteins, or domains described herein with substitution of one or more catalytic residues to decrease or eliminate activity of the enzyme, polypeptide, protein, or domain (e.g. decreased- activity variants). In some embodiments, a decreased activity variant of an enzyme, polypeptide, protein, or domain described herein comprises a disrupting substitution of at least one, at least two, three, four, five, six, or all catalytic residues. In some embodiments, any of the endonucleases described herein can comprise a nickase mutation. In some embodiments, any of the endonucleases described herein can comprise a RuvC domain lacking nuclease activity. In some embodiments, any of the endonucleases described herein can be configured to cleave one strand of a double-stranded target deoxyribonucleic acid. In some embodiments, any of the endonucleases described herein can comprise can be configured to lack endonuclease activity or be catalytically dead.

[0054] Conservative substitution tables providing functionally similar amino acids are available from a variety of references (see, for e.g., Creighton, Proteins: Structures and Molecular Properties (W H Freeman & Co.; 2nd edition (December 1993), which is incorporated herein in its entirety for all purposes). The following eight groups each contain amino acids that are conservative substitutions for one another:

1) Alanine (A), Glycine (G);

2) Aspartic acid (D), Glutamic acid (E);

3) Asparagine (N), Glutamine (Q);

4) Arginine (R), Lysine (K);

5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V);

6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W);

7) Serine (S), Threonine (T); and

8) Cysteine (C), Methionine (M). Example Embodiments

[0055] In some aspects, the present disclosure provides for a composition comprising a fusion protein comprising (a) a first fragment comprising WED and RECI domains of, derived from, or obtained from a first Casl2 polypeptide or enzyme (e.g. a natural genomic or polypeptide sequence of a Cast 2 polypeptide or enzyme); (b) a heterologous domain; and (c) a second fragment comprising RuvC and Nuc domains of, derived from, or obtained from a second Cast 2 polypeptide (e.g. a natural genomic or polypeptide sequence of a Cast 2 polypeptide or enzyme). In some embodiments, the Casl2 enzyme is configured to bind a double-stranded deoxyribonucleic acid (DNA) site. In some embodiments, the second fragment further comprises a REC2 domain of, derived from, or obtained from a second Casl2 polypeptide. In some embodiments, the first fragment and the second fragment are derived from a same Cast 2 polypeptide (e.g. the first and the second Casl2 polypeptide are the same). In some embodiments, the first fragment and the second fragment are derived from different Cast 2 polypeptides (e.g. the first and the second Casl2 polypeptide are different). In some embodiments, the first fragment and the second fragment do not comprise an inactivating mutation in an active site residue of the first or the second Cast 2 polypeptide. In some embodiments, the first fragment and the second fragment do comprise an inactivating mutation in an active site residue of the first or the second Cast 2 polypeptide. In some embodiments, the fusion protein further comprises a linker between the first fragment and the heterologous domain, or between the heterologous domain and the second fragment. In some embodiments, the linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof. In some embodiments, the linker comprises LPXTG, GGG, (GGG) n , (GGGGS)n, (GGGS)n, N1.7, a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/SpyCatcher sequences linked by an isopeptide bond. In some embodiments, the composition further comprises an insert DNA molecule. In some embodiments, the composition further comprises a guide polynucleotide configured to interact with the Casl2 polypeptide.

[0056] The Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) can be any suitable Casl2 polypeptide (e.g. a Casl2 polypeptide that can be separated into non-contiguous fragments while still retaining enzymatic or binding activity). The Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) can be from particular species, e.g. Streptococcus pyogenes, Parageobacillus thermoglucosidasius, an archeon, Candidates Micrarchaeota (archeon), Candidates Aureabacteria (bacterium), Acidibacillus sulfur oxidans, Ruminococcus, Syntrophomonas palmitatica, Clostridium novyi, or any combination thereof. In some embodiments, the Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) is a Class 2, Type V-F or Casl2f polypeptide (for which example domain organization, functional residues, and structure relative to SEQ ID NO: 84 are outlined in e.g. Xiao et al. Nucleic Acids Res. 2021 Apr 19; 49(7): 4120-4128, which is incorporated by reference herein for all purposes). In some cases, a Class 2, Type V-F or Casl2f polypeptide according to the disclosure comprises one or more active site residues D326, E422, D510 (from the RuvC domain), or R490 (from the Nuc domain) relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 84, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine). In some cases, a Class 2, Type V-F or Casl2f polypeptide according to the disclosure comprises one or more PAM interacting residues S142, R163, Y146, S286, Y146, K196, RECl c residues 134-152, or Hl 39 relative to SEQ ID NO: 84, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine). The Cast 2 polypeptide (e.g. first or second Cast 2 polypeptide, or intact Casl2 polypeptide) can comprise WED, RECI, RuvC, Nuc, or REC2 domains (or any combination thereof) having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to WED, RECI, RuvC, Nuc, or REC2 domains of any one of SEQ ID Nos: 1, 2, 5, 6, 11, 13, 15, 24-43, or 84, or a variant thereof.

[0057] The heterologous domain can comprise any suitable polypeptide residues or domains of appropriate size. In some cases, the heterologous domain comprises (e.g. consists of) at least about 100-1500 amino acids in length. In some cases, the heterologous domain comprises (e.g. consists of) at least about 100-2000 amino acids in length. In some cases, the heterologous domain comprises (e.g. consists of) at least about 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length, or any range between these values. In some cases, the heterologous domain comprises (e.g. consists of) at most about 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length, or any range between these values. In some embodiments, the heterologous domain comprises (e.g. consists of) at least about 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length to at most about 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length, or any range between these values. The heterologous domain can comprise an enzyme. The heterologous domain can comprise a DNA-binding or a DNA-conjugating domain. The heterologous domain can comprise a domain with DNA-dependent DNA polymerase activity or a domain with topoisomerase activity. The heterologous domain can comprise a T7 DNA polymerase domain, aBst polymerase domain or an analog thereof (e.g. a Bst large fragment polymerase domain or aBst. 2.0 polymerase domain), a T4 DNA polymerase domain, a Taq polymerase domain, a Vent polymerase domain, a Q5 polymerase domain, a Klenow fragment domain, a DNA polymerase theta domain, or a Phi29 polymerase domain, or a functional fragment or derivative thereof. Example organization, structure, and function of T7 DNA polymerase can be found in e.g. Doublie et al. Curr Opin Struct Biol. 1998 Dec;8(6):704-12. doi: 10.1016/s0959- 440x(98)80089-4 and UniProtKB/Swiss-Prot accession no. P00581.1, both of which are incorporated by reference herein for all purposes. In some cases, a T7 DNA polymerase domain according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) H506, R518, K522, Y526, E480, or Y530, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 46. In some embodiments, a T7 DNA polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 46, or a variant thereof. Example organization, structure, and function of large fragment Bst polymerase (e.g. can be found in e.g. SEQ ID NO: 45) can be found in e.g. Oscorbin et al. Comput Struct Biotechnol J. 2023 Sep 12:21 :4519-4535. doi: 10.1016/j csbj .2023.09.008. eCollection 2023, which is incorporated by reference herein in its entirety for all purposes. In some embodiments, a Bst polymerase domain (e.g. Bst large fragment or Bst 2.0 such as SEQ ID Nos: 44 or 45) according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) D653, D830, E831, H829, Q797, R615, or E658, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 45. In some embodiments, a Bst polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 45, or a variant thereof. Example organization, structure, and function of phi29 polymerase can be found in, e.g. Del Prado et al. Sci Rep. 2019 Jan 29;9(1):923. doi: 10.1038/s41598-018-37513-7 and UniProtKB/Swiss-Prot accession no. P03680.1, both of which are incorporated by reference herein in its entirety for all purposes. In some embodiments, a phi29 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y101, T189, Q180, 12, 14, 15, 59, 61, 62, 65, 66, 69, 122, 123, 128, 143, 148, 169, 196, 198, 249, 252, 253, 255, 364, 371, 383, 392, 393, 434, 437, 438, 455, 456, 457, 458, 498, or 500, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 51. In some embodiments, a phi29 polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 51, or a variant thereof. Example organization, structure, and function of Taq polymerase can be found in, e.g. Eun. “Enzymology Primer for Recombinant DNA Technology” (Chapter 6, DNA polymerases). ISBN 978-0-12-243740-3 (academic Press, 1996), Park et al. Mol Cells. 1997 Jun 30;7(3):419-, 24 and UniProtKB/Swiss-Prot accession no. P19821.1, both which are incorporated by reference herein in their entirety for all purposes. In some embodiments, a Taq polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) G308, V310, L356, R405, R25, or R74 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 48. In some embodiments, a Taq polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 48, or a variant thereof. Example organization, structure, and function of T4 polymerase can be found in, e.g. Wang et al. Biochemistry. 1996 Jun 25;35(25):8110-9. doi: 10.102 l/bi960178r, which is incorporated by reference herein in its entirety for all purposes. In some embodiments, a T4 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y320 or E191 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 47. In some embodiments, a T4 polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 47, or a variant thereof. Example organization, structure, and function of Klenow polymerase can be found in, e.g. Polesky et al. J Biol Chem. 1990 Aug 25;265(24): 14579-91, which is incorporated by reference herein in its entirety for all purposes. In some embodiments, a T4 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y766, R841, N845, N849, R668, or D882 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 50. In some embodiments, a Klenow polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least

84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least

91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least

98%, at least 99%, or 100% sequence identity to SEQ ID NO: 50, or a variant thereof. Example organization, structure, and function of Vent (e.g. T. litoralis) polymerase can be found in, e.g. Gardner et al. Nucleic Acids Res. 1999 Jun 15;27(12):2545-53. doi: 10.1093/nar/27.12.2545, which is incorporated by reference in its entirety herein for all purposes. In some embodiments, a vent polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) A488, N494, S495, Y412, K490, N494, Q486, R487, Y496, or Y499 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 49 or a variant thereof. In some embodiments, a Vent polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 49, or a variant thereof.

[0058] The heterologous domain can comprise a topoisomerase domain. The heterologous domain can comprise a Type I (e.g. Type 1 A) or Type II topoisomerase domain, any combination thereof, or a functional fragment or derivative thereof. Example organization and function of Type I (e.g. type IA) topoisomerases can be found in e.g. Chen et al. J Biol Chem. 1998 Mar 13;273(11):6050-6. doi: 10.1074/jbc.273.11.6050, which is incorporated by reference herein for all purposes. In some cases, a Type I topoisomerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) E9, H33, Di l l, El 15, N309, E313, T318, R321, T322, D323, H365, or T496, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 85. Example organization, structure, and function of Type II topoisomerases can be found in e.g. Liu et al. J Biol Chem. 1998 Aug 7;273(32):20252-60. doi: 10.1074/jbc.273.32.20252, which is incorporated by reference herein for all purposes. In some cases, a Type II topoisomerase according to the current disclosure can comprise one or more critical (e.g. active site) residue(s) Y782, R690, D697, K700, R704, or R781 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 86, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine). The heterologous domain can comprise E. coll Eubacterial DNA topoisomerase I, E. coll Eubacterial DNA topoisomerase III, S. cerevisiae Yeast DNA topoisomerase III, H. sapiens DNA topoisomerase Illa or 111 [I, S. acidocaldarius eubacterial and archaeal reverse DNA gyrase, M. kandleri eubacterial reverse gyrase, H. sapiens eukaryotic DNA topoisomerase I, Vaccinia poxvirus topoisomerase I, or M. kandleri hyperthermophilic eubacterial DNA topoisomerase V, phiX174 protein A, or a functional fragment thereof. The heterologous domain can comprise E. coli eubacterial DNA gyrase, E. coli eubacterial DNA topoisomerase IV, S. cerevisiae yeast DNA topoisomerase II, H. sapiens mammalian DNA topoisomerase Ila or 11 [i, or S. shibatae archaeal DNA topoisomerase VI, or a functional fragment thereof. [0059] The insert DNA molecule can have a variety of structures and configurations suitable for insertion into genomic DNA (e.g. via homologous recombination or other DNA repair methods). In some embodiments, the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, the insert DNA molecule comprising a region with complementarity to a region 5' to the double-stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site. In some embodiments, the region with complementarity to a region 5' to the double-stranded DNA site comprises (e.g. consists of) at least about 4 bp or nucleotides to at least about 400 bp or nucleotides or at least about 4 bp or nucleotides to at least about 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 5' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 5' to the double-stranded DNA site comprises at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 5' to the double- stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220,

230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides to at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150,

160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340,

350, 360, 370, 380, 390, or 400 bp or nucleotides, or any range between these values. In some embodiments, the region with complementarity to a region 3' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 3' to the double- stranded DNA site comprises at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 3' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides to at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides, or any range between these values. In some embodiments, the insert DNA molecule further comprises a transgene. In some embodiments, the transgene comprises an open reading frame (ORF). In some embodiments, the transgene comprises a promoter operably linked to an ORF. In some embodiments, the transgene comprises at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt). In some embodiments, the transgene comprises at most about 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt). In some embodiments, the transgene comprises at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500,

2.750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000,

6.250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500,

9.750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt) to at most about 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000,

4.250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500,

7.750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt), or any range between these values. In some embodiments, the transgene is flanked by the region with complementarity to a region 5' to the double-stranded DNA site and the region with complementarity to a region 3' to the nucleic acid site. In some embodiments, the transgene comprises an open reading frame (ORF). In some embodiments, the transgene comprises a promoter operably linked to an ORF. In some embodiments, the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to a guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to a guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide. In some embodiments, the insert DNA molecule is linked to a hydroxyl (e.g. catalytic hydroxyl) group of the domain having DNA topoisomerase activity at a first end, and the insert DNA molecule comprises the region homologous to a region 5' to the nucleic acid site or the region homologous to a region 3' to the nucleic acid site at a second end. In some embodiments, the insert DNA molecule comprises a first end configured to hybridize with a hybridization domain of a guide polynucleotide at the 3' end of the insert DNA when the guide polynucleotide further comprises a hybridization domain at a 3' end.

[0060] The guide polynucleotide configured to interact with the Cast 2 polypeptide (e.g. first or second Cast 2 polypeptide, or intact Cast 2 polypeptide) can be any suitable guide polynucleotide configured to hybridize to the DNA site (e.g. an RNA comprising guide suitable for interacting with at Casl2f enzyme or a Class 2, Type V-F enzyme, or a mixture of RNA and DNA comprising a region configured to hybridize or complementary to the DNA site). In some embodiments, the guide polynucleotide further comprises a hybridization domain at a 3' end. In some embodiments, the hybridization domain comprises at least about 1, 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides. In some embodiments, the hybridization domain comprises at most about 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides. In some embodiments, the hybridization domain comprises at least about 1, 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides to at most about 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides, or any range between these values.

[0061] The composition can comprise a pharmaceutically acceptable excipient. The excipient can comprise a transfection agent (e.g. a liposome or a lipid nanoparticle). In some embodiments, a fusion protein of the disclosure is provided in a lipid nanoparticle (LNP) by encapsulating the fusion protein with an optional guide polynucleotide or insert DNA molecule into the LNP. This can be performed using methodologies documented e.g. in Finn et al. Cell Rep. 2018 Feb 27;22(9):2227-2235. doi: 10.1016/j.celrep.2018.02.014 or Yin et al. Nat Biotechnol. 2016 Mar;34(3):328-33. doi: 10.1038/nbt.3471, both of which are incorporated by reference herein in their entireties for all purposes.

[0062] In some aspects, the present disclosure provides for a composition comprising a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide (e.g. a natural genomic or polypeptide sequence of a Cast 2 polypeptide or enzyme) configured to bind a doublestranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (i) a sequence comprising WED and RECI domains of, derived from, or obtained from a first Casl2 polypeptide (e.g. a natural genomic or polypeptide sequence of a Cast 2 polypeptide or enzyme); or (ii) a sequence comprising RuvC, REC2, and Nuc domains of, derived from, or obtained from a second Casl2 polypeptide (e.g. a natural genomic or polypeptide sequence of a Casl2 polypeptide or enzyme); and (c) a third segment comprising a heterologous domain of at least about 100 amino acids. In some embodiments, the first segment further comprises WED, RECI, RuvC, REC2, and Nuc domains of, derived from, or obtained from the first Cast 2 polypeptide. In some embodiments, the fusion protein further comprises a linker between (a), (b), or (c). In some embodiments, the linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof. In some embodiments, the linker comprises LPXTG, GGG, (GGG) n , (GGGGS) n , (GGGS) n , N1.7, a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/Spy Catcher sequences linked by an isopeptide bond. In some embodiments, the composition further comprises an insert DNA molecule. In some embodiments, the composition further comprises a guide polynucleotide configured to interact with the Cast 2 polypeptide (e.g. first or second Cast 2 polypeptide, or intact Cast 2 polypeptide).

[0063] The Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) can be any suitable Casl2 polypeptide (e.g. a Casl2 polypeptide that can be separated into non-contiguous fragments while still retaining enzymatic or binding activity). The Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) can be from particular species, e.g. Streptococcus pyogenes, Parageobacillus thermoglucosidasius, an archeon, Candidates Micrarchaeota (archeon), Candidates Aureabacteria (bacterium), Acidibacillus sulfur oxidans, Ruminococcus, Syntrophomonas palmitatica, Clostridium novyi, or any combination thereof. In some embodiments, the Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) is a Class 2, Type V-F or Casl2f polypeptide (for which example domain organization, functional residues, and structure relative to SEQ ID NO: 84 are outlined in e.g. Xiao et al. Nucleic Acids Res. 2021 Apr 19; 49(7): 4120-4128, which is incorporated by reference herein for all purposes). In some cases, a Class 2, Type V-F or Casl2f polypeptide according to the disclosure comprises one or more active site residues D326, E422, D510 (from the RuvC domain), or R490 (from the Nuc domain) relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 84, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine). In some cases, a Class 2, Type V-F or Casl2f polypeptide according to the disclosure comprises one or more PAM interacting residues S142, R163, Y146, S286, Y146, K196, RECl c residues 134-152, or Hl 39 relative to SEQ ID NO: 84, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine). The Cast 2 polypeptide (e.g. first or second Cast 2 polypeptide, or intact Casl2 polypeptide) can comprise WED, RECI, RuvC, Nuc, or REC2 domains (or any combination thereof) having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to WED, RECI, RuvC, Nuc, or REC2 domains of any one of SEQ ID Nos: 1, 2, 5, 6, 11, 13, 15, 24-43, or 84, or a variant thereof.

[0064] The heterologous domain can comprise any suitable polypeptide residues or domains of appropriate size. In some cases, the heterologous domain comprises (e.g. consists of) at least about 100-1500 amino acids in length. In some cases, the heterologous domain comprises (e.g. consists of) at least about 100-2000 amino acids in length. In some cases, the heterologous domain comprises (e.g. consists of) at least about 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length, or any range between these values. In some cases, the heterologous domain comprises (e.g. consists of) at most about 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length, or any range between these values. In some embodiments, the heterologous domain comprises (e.g. consists of) at least about 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length to at most about 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length, or any range between these values. The heterologous domain can comprise an enzyme. The heterologous domain can comprise a DNA-binding or a DNA-conjugating domain. The heterologous domain can comprise a domain with DNA-dependent DNA polymerase activity or a domain with topoisomerase activity. The heterologous domain can comprise a T7 DNA polymerase domain, aBst polymerase domain or an analog thereof (e.g. a Bst large fragment polymerase domain or aBst. 2.0 polymerase domain), a T4 DNA polymerase domain, a Taq polymerase domain, a Vent polymerase domain, a Q5 polymerase domain, a Klenow fragment domain, a DNA polymerase theta domain, or a Phi29 polymerase domain, or a functional fragment or derivative thereof. Example organization, structure, and function of T7 DNA polymerase can be found in e.g. Doublie et al. Curr Opin Struct Biol. 1998 Dec;8(6):704-12. doi: 10.1016/s0959- 440x(98)80089-4 and UniProtKB/Swiss-Prot accession no. P00581.1, both of which are incorporated by reference herein for all purposes. In some cases, a T7 DNA polymerase domain according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) H506, R518, K522, Y526, E480, or Y530, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 46. In some embodiments, a T7 DNA polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO:

46, or a variant thereof. Example organization, structure, and function of large fragment Bst polymerase (e.g. can be found in e.g. SEQ ID NO: 45) can be found in e.g. Oscorbin et al. Comput Struct Biotechnol J. 2023 Sep 12:21 :4519-4535. doi: 10.1016/j csbj .2023.09.008. eCollection 2023, which is incorporated by reference herein in its entirety for all purposes. In some embodiments, a Bst polymerase domain (e.g. Bst large fragment or Bst 2.0 such as SEQ ID Nos: 44 or 45) according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) D653, D830, E831, H829, Q797, R615, or E658, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 45. In some embodiments, a Bst polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 45, or a variant thereof. Example organization, structure, and function of phi29 polymerase can be found in, e.g. Del Prado et al. Sci Rep. 2019 Jan 29;9(1):923. doi:

10.1038/s41598-018-37513-7 and UniProtKB/Swiss-Prot accession no. P03680.1, both of which are incorporated by reference herein in its entirety for all purposes. In some embodiments, a phi29 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y101, T189, Q180, 12, 14, 15, 59, 61, 62, 65, 66, 69, 122, 123, 128, 143, 148, 169, 196, 198, 249, 252, 253, 255, 364, 371, 383, 392, 393, 434, 437, 438, 455, 456, 457, 458, 498, or 500, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 51. In some embodiments, a phi29 polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least

84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least

91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least

98%, at least 99%, or 100% sequence identity to SEQ ID NO: 51, or a variant thereof. Example organization, structure, and function of Taq polymerase can be found in, e.g. Eun. “Enzymology Primer for Recombinant DNA Technology” (Chapter 6, DNA polymerases). ISBN 978-0-12-243740-3 (academic Press, 1996), Park et al. Mol Cells. 1997 Jun 30;7(3):419-, 24 and UniProtKB/Swiss-Prot accession no. P19821.1, both which are incorporated by reference herein in their entirety for all purposes. In some embodiments, a Taq polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) G308, V310, L356, R405, R25, or R74 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 48. In some embodiments, a Taq polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 48, or a variant thereof. Example organization, structure, and function of T4 polymerase can be found in, e.g. Wang et al. Biochemistry. 1996 Jun 25;35(25):8110-9. doi: 10.102 l/bi960178r, which is incorporated by reference herein in its entirety for all purposes. In some embodiments, a T4 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y320 or E191 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 47. In some embodiments, a T4 polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 47, or a variant thereof. Example organization, structure, and function of Klenow polymerase can be found in, e.g. Polesky et al. J Biol Chem. 1990 Aug 25;265(24): 14579-91, which is incorporated by reference herein in its entirety for all purposes. In some embodiments, a T4 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y766, R841, N845, N849, R668, or D882 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 50. In some embodiments, a Klenow polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least

84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least

91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least

98%, at least 99%, or 100% sequence identity to SEQ ID NO: 50, or a variant thereof. Example organization, structure, and function of Vent (e.g. T. litoralis) polymerase can be found in, e.g. Gardner et al. Nucleic Acids Res. 1999 Jun 15;27(12):2545-53. doi: 10.1093/nar/27.12.2545, which is incorporated by reference in its entirety herein for all purposes. In some embodiments, a vent polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) A488, N494, S495, Y412, K490, N494, Q486, R487, Y496, or Y499 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 49 or a variant thereof. In some embodiments, a Vent polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 49, or a variant thereof.

[0065] The heterologous domain can comprise a topoisomerase domain. The heterologous domain can comprise a Type I (e.g. Type 1 A) or Type II topoisomerase domain, any combination thereof, or a functional fragment or derivative thereof. Example organization and function of Type I (e.g. type IA) topoisomerases can be found in e.g. Chen et al. J Biol Chem. 1998 Mar 13;273(11):6050-6. doi: 10.1074/jbc.273.11.6050, which is incorporated by reference herein for all purposes. In some cases, a Type I topoisomerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) E9, H33, Di l l, El 15, N309, E313, T318, R321, T322, D323, H365, or T496, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 85. Example organization, structure, and function of Type II topoisomerases can be found in e.g. Liu et al. J Biol Chem. 1998 Aug 7;273(32):20252-60. doi: 10.1074/jbc.273.32.20252, which is incorporated by reference herein for all purposes. In some cases, a Type II topoisomerase according to the current disclosure can comprise one or more critical (e.g. active site) residue(s) Y782, R690, D697, K700, R704, or R781 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 86, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine). The heterologous domain can comprise E. coll Eubacterial DNA topoisomerase I, E. coll Eubacterial DNA topoisomerase III, S. cerevisiae Yeast DNA topoisomerase III, H. sapiens DNA topoisomerase Illa or 111 [I, S. acidocaldarius eubacterial and archaeal reverse DNA gyrase, M. kandleri eubacterial reverse gyrase, H. sapiens eukaryotic DNA topoisomerase I, Vaccinia poxvirus topoisomerase I, or AL kandleri hyperthermophilic eubacterial DNA topoisomerase V, phiX174 protein A, or a functional fragment thereof. The heterologous domain can comprise E. coli eubacterial DNA gyrase, E. coli eubacterial DNA topoisomerase IV, S. cerevisiae yeast DNA topoisomerase II, H. sapiens mammalian DNA topoisomerase Ila or 11 [f or S. shibatae archaeal DNA topoisomerase VI, or a functional fragment thereof.

[0066] The insert DNA molecule can have a variety of structures and configurations suitable for insertion into genomic DNA (e.g. via homologous recombination or other DNA repair methods). In some embodiments, the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, the insert DNA molecule comprising a region with complementarity to a region 5' to the double-stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site. In some embodiments, the region with complementarity to a region 5' to the double-stranded DNA site comprises (e.g. consists of) at least about 4 bp or nucleotides to at least about 400 bp or nucleotides or at least about 4 bp or nucleotides to at least about 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 5' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 5' to the double-stranded DNA site comprises at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 5' to the double- stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220,

230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides to at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150,

160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340,

350, 360, 370, 380, 390, or 400 bp or nucleotides, or any range between these values. In some embodiments, the region with complementarity to a region 3' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 3' to the double- stranded DNA site comprises at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 3' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290,

300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides to at most about 10,

20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220,

230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides, or any range between these values. In some embodiments, the insert DNA molecule further comprises a transgene. In some embodiments, the transgene comprises an open reading frame (ORF). In some embodiments, the transgene comprises a promoter operably linked to an ORF. In some embodiments, the transgene comprises at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt). In some embodiments, the transgene comprises at most about 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt). In some embodiments, the transgene comprises at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500,

2.750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000,

6.250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500,

9.750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt) to at most about 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000,

4.250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500,

7.750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt), or any range between these values. In some embodiments, the transgene is flanked by the region with complementarity to a region 5' to the double-stranded DNA site and the region with complementarity to a region 3' to the nucleic acid site. In some embodiments, the transgene comprises an open reading frame (ORF). In some embodiments, the transgene comprises a promoter operably linked to an ORF. In some embodiments, the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to a guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to a guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide. In some embodiments, the insert DNA molecule is linked to a hydroxyl (e.g. catalytic hydroxyl) group of the domain having DNA topoisomerase activity at a first end, and the insert DNA molecule comprises the region homologous to a region 5' to the nucleic acid site or the region homologous to a region 3' to the nucleic acid site at a second end. In some embodiments, the insert DNA molecule comprises a first end configured to hybridize with a hybridization domain of a guide polynucleotide at the 3' end of the insert DNA when the guide polynucleotide further comprises a hybridization domain at a 3' end.

[0067] The guide polynucleotide configured to interact with the Cast 2 polypeptide (e.g. first Casl2 polypeptide or second Casl2 polypeptide, either intact or part of segments described herein) can be any suitable guide polynucleotide (e.g. an RNA comprising guide suitable for interacting with at Casl2f enzyme or a Class 2, Type V-F enzyme, or a mixture of RNA and DNA). In some embodiments, the guide polynucleotide further comprises a hybridization domain at a 3' end. In some embodiments, the hybridization domain comprises at least about 1, 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides. In some embodiments, the hybridization domain comprises at most about 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides. In some embodiments, the hybridization domain comprises at least about 1, 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides to at most about 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides, or any range between these values.

[0068] The composition can comprise a pharmaceutically acceptable excipient. The excipient can comprise a transfection agent (e.g. a liposome or a lipid nanoparticle). In some embodiments, a fusion protein of the disclosure is provided in a lipid nanoparticle (LNP) by encapsulating the fusion protein with an optional guide polynucleotide or insert DNA molecule into the LNP. This can be performed using methodologies documented e.g. in Finn et al. Cell Rep. 2018 Feb 27;22(9):2227-2235. doi: 10.1016/j.celrep.2018.02.014 or Yin et al. Nat Biotechnol. 2016 Mar;34(3):328-33. doi: 10.1038/nbt.3471, both of which are incorporated by reference herein in their entireties for all purposes.

[0069] In some aspects, the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to the site (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein the first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to the double- stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site; and (iii) a guide polynucleotide configured to interact with the first Casl2 polypeptide or the second Casl2 polypeptide and configured to hybridize to the DNA site. In some embodiments, the second fragment further comprises a REC2 domain of the second Casl2 polypeptide. In some embodiments, the first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide. In some embodiments, the heterologous domain comprises at least about 100-1500 amino acids in length. In some embodiments, the heterologous domain comprises a domain with DNA- dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, the first Casl2 polypeptide and the second Casl2 polypeptide comprise a same Casl2 polypeptide. In some embodiments, the first Casl2 polypeptide and the second Casl2 polypeptide comprise different Casl2 polypeptides. In some embodiments, the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a singlestranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide. In some embodiments: (a) the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA.

[0070] In some aspects, the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to the site (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein the first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to the double- stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site; and (iii) a guide polynucleotide configured to interact with the first Casl2 polypeptide or the second Casl2 polypeptide and configured to hybridize to the DNA site. In some embodiments, the second fragment further comprises a REC2 domain of the second Casl2 polypeptide. In some embodiments, the first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide. In some embodiments, the heterologous domain comprises at least about 100-1500 amino acids in length. In some embodiments, the heterologous domain comprises a domain with DNA- dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, the first Casl2 polypeptide and the second Casl2 polypeptide comprise a same Casl2 polypeptide. In some embodiments, the first Casl2 polypeptide and the second Casl2 polypeptide comprise different Casl2 polypeptides. In some embodiments, the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a singlestranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide. In some embodiments,: (a) the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA. In some embodiments, the cell is a bacterial, archaeal, plant, mammalian, primate, or human cell.

[0071] In some aspects, the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to the site

(i) a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and (c) a third segment comprising a heterologous domain of at least about 100 amino acids; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to the doublestranded DNA site or a region with complementarity to a region 3' to the nucleic acid site; and (iii) a guide polynucleotide configured to interact with the first Cast 2 polypeptide or the second Cast 2 polypeptide and configured to hybridize to the DNA site. In some embodiments, the Casl2 polypeptide, the first Casl2 polypeptide, or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide. In some embodiments, the heterologous domain comprises at least about 100-1500 amino acids in length. In some embodiments, the heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Casl2 polypeptide. In some embodiments,: (a) the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA. In some embodiments, said cell is a bacterial, archaeal, plant, mammalian, primate, or human cell. [0072] In some aspects, the present disclosure provides for a kit for disrupting a DNA site, comprising (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Cast 2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Cast 2 polypeptide, wherein the first and second Cast 2 polypeptide are configured to bind a doublestranded deoxyribonucleic acid (DNA) site; and (ii) a guide polynucleotide configured to interact with the first Cast 2 polypeptide or the second Cast 2 polypeptide and configured to hybridize to the DNA site. In some embodiments, the kit further comprises (iii) an insert DNA molecule comprising a region with complementarity to a region 5' to the double-stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site. In some embodiments, the second fragment further comprises a REC2 domain of the second Casl2 polypeptide. In some embodiments, the first or second Casl2 polypeptide is a Class 2, Type V- F or a Casl2f polypeptide. In some embodiments, the heterologous domain comprises at least about 100-1500 amino acids in length. In some embodiments, the heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Casl2 polypeptide. In some embodiments,: (a) the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA. In some embodiments, the kit further comprises a transfection agent. In some embodiments, the kit further comprises instructions for targeting the DNA site. [0073] In some aspects, the present disclosure provides for a kit for disrupting a DNA site, comprising (i) a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; (ii) a guide polynucleotide configured to interact with the first Cast 2 polypeptide or the second Cast 2 polypeptide and configured to hybridize to the DNA site. In some embodiments, the kit further comprises (iii) an insert DNA molecule comprising a region with complementarity to a region 5' to the double- stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site. In some embodiments, the second fragment further comprises a REC2 domain of the second Cast 2 polypeptide. In some embodiments, the first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide. In some embodiments, the heterologous domain comprises at least about 100- 1500 amino acids in length. In some embodiments, the heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, In some embodiments, insert DNA molecule is: (i) linked to the first or the second Casl2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Casl2 polypeptide. In some embodiments: (a) the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA. In some embodiments, the kit further comprises a transfection agent. In some embodiments, the kit further comprises instructions for targeting the DNA site.

Table 1: Sequences of Genes and Components Described Herein (change in text style format denotes domain boundaries below)

EXAMPLES

[0074] Example 1. -Testing activity of split Casl2f complexes

[0075] This example demonstrates that Casl2f group enzymes can be rearranged into a split domain format inserting a heterologous domain in between N- and C-terminal domains to allow for e.g. new enzymatic activity while simultaneously preserving specific guided DNA cleavage activity.

[0076] Protein expression and purification

[0077] All constructs were expressed and purified using the same method. Recombinant protein coding sequences were cloned into the pET45 (EMD Millipore) vector, and the vector was transformed into BL21 (DE3)pLysS E. coli (Therm ofi scher) for expression. The protein was expressed at 20°C for 48h using an Overnight Express Instant TB Media (EMD Millipore). After incubation, the E. coli biomass was harvested by centrifugation 2 min at 4500 RFC and frozen at -80°C. The biomass was lysed with BugBuster protein extraction reagent (EMD Millipore), which additionally included 90U rLysozyme per 10ml of lysate (EMD Millipore), 1 tablet protease inhibitor per 10 mL of lysate (Pierce Protease Inhibitor mini tablets, EDTAS- firee from Thermo Scientific), 50mM sodium phosphate pH7.7, 0.05% TritonX, and 2.5mM TCEP. Lysis was conducted at 12°C for 45 min. Next, the lysate was mixed with dilution buffer (50mM sodium phosphate pH7.7, IM NaCl, 0.05% TritonX, 2.5mM TCEP) using a ratio of 1 : 1 and incubated for 45 min at 12°C. After incubation the preparation was centrifugated 14000rpm for Ih at 8C. Purification was conducted using a batch method with His- Affinity Gel (Zymo Research), and included a loading procedure, wash procedure, and elution procedure. The washing buffer included 50mM sodium phosphate, 0.5M NaCl, 30mM Imidazole, 0.05% TritonX and 2.5mM TCEP. The protein was eluted using 50mM sodium phosphate pH7.7, 300mM NaCl, 300mM Imidazole, 0.05% TritonX and 2.5mMTCEP. In a concluding procedure, the eluted protein was dialyzed at room temperature for 3h using Slide-A-Lyzer Dialysis Cassette G2 (Thermo Scientific) where the dialysis buffer included: 20mM Tris-HCl pH7.5, 300mM NaCl, 0.05% TritonX and 2mM DTT. The products of purification were analyzed using PAGE-SDS electrophoresis (see the Figure 7 and 8 top panel) and quantify using The Qubit Protein Assay (Therm ofi scher).

Target cleavage - in vitro assay

[0078] Following purification of the split casl2f constructs; ribonucleoprotein complex formation was assessed by proper Cas function and DNA cleavage.

[0079] First, ribonucleoprotein complexes from the purified proteins were constructed. In this experiment six different variants were tested - SpCasl2f (SEQ ID NO: 70), Ptcasl2f (SEQ ID NO: 71), SpCasl2f-inter (SEQ ID NO: 9), Ptcasl2 C-tag (SEQ ID NO: 72) and Ptcasl2f N-tag (SEQ ID NO: 73). The complex formation was conducted at 37°C for 30 min. the reaction includes: 1 pM gRNA, 1 pM Cas variant, 14mM Tris-HCl pH 7.5, 80mM NaCl, ImM DTT and 0.01% TritonX. The gRNA_Sp (SEQ ID NO: 56)was used in the reaction with enzymes including Sp Casl2f components were the gRNA Pt (SEQ ID NO: 57) was used with Pt Casl2f components including enzymes.

[0080] After reconstitution as ribonucleoprotein complexes, the complexes above were used in DNA cleavage reactions. The reactions included: 0.7pM Cas-gRNA ribonucleoprotein complex, 12.5 mM Tris-HCl pH7.5, 53 mM NaCl, 1 mM DTT, 0.01% TritonX, 5mM MgCl 2 „ and lOnM target DNA (DNA_Sp cleavage substrate or DNA_Pt cleavage substrate). gRNA_Pt and gRNA Sp were generated using HiScribe T7 and Monarch RNA Cleanup Kit (both NEB) before the reactions accordingly to manufacturer protocol. The DNA Sp cleavage substrate (target DNA) was used in the reactions with enzymes including Sp Casl2f components, while DNA Pt was used in the reaction with Pt Casl2f components including enzymes; both target DNA substrates were 513bp long and included the target sequence AGTTGACCCAACGTCGCCGG. The reaction was conducted at 37°C for Ih. The Products of the reaction were analyzed using Agarose-gel electrophoresis. Successful cleavage reactions generated two products: ~215bp and ~298bp.

[0081] As can be seen in the agarose gels in FIG. 5A and FIG. 5B (bottom panels), of the split- easi 2f variants tested retained activity similar to wild-type, as they were able to cleave the target DNA fragments into appropriate sizes. [0082] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.