Biosynthetic Enzymes Field This invention relates to the biosynthesis of complex triterpenoid saponins and intermediates, such as quillaic acid, and to genes and polypeptides involved in this biosynthesis. Saponaria officinalis (family Caryophyllaceae), commonly known as soapwort, is a perennial flowering plant native to Europe and Asia that has been used as a traditional source of soap [1]. The well-known detergent property of soapwort is due to the high content of amphiphilic saponins present in the plant extract. The ancient Greeks, Romans and Egyptians used soapwort extracts to clean and wash clothing and later, the first American colonists brought soapwort plants from Europe to North America for their household uses [2]. In addition to their detergent properties, soapwort extracts have been used in folk medicine to treat conditions such as syphilis, gout, rheumatism and jaundice [3]. Soapwort extracts also play an important role in the Middle Eastern culture as the extracts have been used to make tahini halvah, a traditional Middle Eastern dessert [4]. Today, soapwort extracts are still used in cosmetics, nutraceutical and phytomedicinal products [5]. Additionally, saponin layer of soapwort extracts have been investigated for their potential use in bioremediation, as food surfactants, for their anti-fungicidal and immunotoxicity activities [6-9] S. officinalis is a rich source for saponins with various aglycone cores, such as quillaic acid, gypsogenin and gypsogenic acid. The major saponins found in soapwort extracts are reported as saponariosides A and B (SpA, SpB) [1]. SpA and SpB are similar in chemical structure. They are both composed of quillaic acid aglycone, a C-30 triterpenoid, decorated with a branched trisaccharide at the C-3 position, and a linear tetrasaccharide at the C28 -D-fucose, is linked to a - D-quinovose with an acetyl group attached. The only chemical difference between SpA and SpB is the addition of a -D-xylose on the quinovose moiety on the C28 sugar chain in SpA. Interestingly, QS-21, a triterpenoid saponin found in Quillaja saponaria shares a striking chemical resemblance to SpA and SpB (Figure 1). QS-21 is a complex triterpenoid saponin synthesised by the Chilean tree Quillaja saponaria (order Fabales). Biochemically, QS-21 consists of a C-30 triterpenoid quillaic acid backbone. This scaffold is decorated with a branched trisaccharide at the C-3 position and a linear tetrasaccharide at the C28 -D- - D- -D-fucose sugar within the tetrasaccharide also features a C-18 acyl chain which is glycosylated with an arabinose sugar. QS-21 is a potent immunostimulatory agent capable of enhancing antibody responses and boosting specific T-cell responses, giving it significant adjuvant potential (Del Giudice et al. Seminars in Immunology, 2018.39: p.14-21; Marciani, D.J. Trends in Pharmacological Sciences, 2018.39(6): p.573-585). The AS01 adjuvant is a liposomal formulation of QS-21 and 3-O- desacyl- - (Del Giudice et al. supra). Despite the promising commercial potential of saponariosides and their intermediates, nothing is known of their biosynthetic pathway. The biosynthesis of saponariosides can be conceptually divided into two stages: (i) the biosynthesis of the quillaic acid core and (ii) the decoration of quillaic acid (Figure 2). However, the actual order can be different in planta and further details are unknown. Many plant natural products are present in low abundance in the plant, and chemical synthesis is often non-viable due to the complex chemical structures. Knowledge of the biosynthetic pathway may allow for metabolic engineering in alternative host system, allowing for large-scale production of the compound of interest. The present inventors have identified and characterised the genes involved in the biosynthesis of complex triterpenoid saponins in the soapwort plant (Saponaria officinalis). These include genes encode enzymes involved in the biosynthesis of QA and glycosyl transferases involved in the glycosylation of QA. Expression of one or more of these genes may be useful in the production of QA and glycosylation products of QA. A first aspect of the invention provides a method for the production of a triterpenoid comprising one or more of; (i) contacting 2,3-oxidosqualene (OS) with a Saponaria officinalis -amyrin synthase (SobAS) comprising an amino acid sequence having at least 80% sequence identity to SEQ ID NO 8, such that said OS is converted into -amyrin; (ii) either; (a) contacting -amyrin with a SoC28 oxidase polypeptide comprising an amino acid sequence having at least 80% sequence identity to SEQ ID NO 2, such that the C28 position of said -amyrin is oxidised to a carboxylic acid to produce oleanolic acid and contacting oleanolic acid with a SoC28C16 oxidase polypeptide comprising an amino acid sequence having at least 50% sequence identity to SEQ ID NO 4, such that the C16 position of said oleanolic acid is oxidised to an alcohol, thereby producing echinocystic acid; or (b) contacting -amyrin with a SoC28C16 oxidase polypeptide comprising an amino acid sequence having at least 50% sequence identity to SEQ ID NO 4, such that the C28 position of said -amyrin is oxidised to a carboxylic acid and the C16 position of said -amyrin is oxidised to an alcohol, thereby producing echinocystic acid; (iii) contacting echinocystic acid with a SoC23 oxidase polypeptide comprising an amino acid sequence having at least 50% sequence identity to SEQ ID NO 6, such that the C-23 position of said echinocystic acid is oxidised to an aldehyde, thereby producing quillaic acid (QA); (iv) contacting QA with Saponaria officinalis QA 3- SoCSL polypeptide comprising an amino acid sequence having at least 60% sequence identity to SEQ ID NO: 10, such that said QA is converted into QA-GlcA; (v) contacting QA-GlcA with a Saponaria officinalis QA-GlcA SoC3Gal polypeptide comprising an amino acid sequence having at least 50% sequence identity to SEQ ID NO: 12, such that said QA-GlcA is converted into QA-GlcA-Gal; (vi) contacting QA-GlcA-Gal with a Saponaria officinalis QA-GlcA-Gal xylosyl transferase SoC3Xyl polypeptide comprising an amino acid sequence having at least 50% sequence identity to SEQ ID NO: 14, such that said QA-GlcA-Gal is converted into QA-Tri (QA-GlcA-Gal-Xyl); (vii) contacting QA-Tri with a Saponaria officinalis QA-Tri fucosyl transferase SoC28Fu polypeptide comprising an amino acid sequence having at least 60% sequence identity to SEQ ID NO: 16, such that said QA-Tri is converted into QA-TriF; (viii) contacting QA-TriF with a Saponaria officinalis QA-TriF rhamnosyl transferase SoC28Rha polypeptide comprising an amino acid sequence having at least 50% sequence identity to SEQ ID NO: 18, such that said QA-TriF is converted into QA-TriFR; (ix) contacting QA-TriFR with a Saponaria officinalis QA-TriFR xyl SoC28Xyl1 polypeptide comprising an amino acid sequence having at least 50% sequence identity to SEQ ID NO: 20, such that said QA-TriFR is converted into QA-TriFRX; and/or (x) contacting QA-TriFRX with a Saponaria officinalis QA-TriFRX xyl SoC28Xyl2 polypeptide comprising an amino acid sequence having at least 50% sequence identity to SEQ ID NO: 22, such that said QA-TriFRX is converted into QA-TriFRXX, (xi) contacting QA-TriFRXX with a Saponaria officinalis QA-TriFRXX quinovosyl transferase SoGH1 polypeptide comprising an amino acid sequence having at least 50% sequence identity to SEQ ID NO: 34, such that said QA-TriFRXX is converted into QA-TriF(Q)RXX, and/or (xii) contacting QA-TriF(Q)RXX with a Saponaria officinalis QA-TriF(Q)RXX acetyl transferase SoBAHD1 polypeptide comprising an amino acid sequence having at least 50% sequence identity to SEQ ID NO: 36, such that said QA-TriF(Q)RXX is converted into saponarioside B (SpB). A method of the first aspect may comprise; contacting -amyrin with a Saponaria officinalis C28 oxidase (SoC28 oxidase) to oxidise the C28 position of the -amyrin to a carboxylic acid to form oleanolic acid, wherein the amino acid sequence of the SoC28 oxidase has at least 80% sequence identity to SEQ ID NO: 2; contacting the oleanolic acid with a Saponaria officinalis C28C16 oxidase (SoC28C16 oxidase) to oxidise the C16 position of the oleanolic acid to an alcohol to form echinocystic acid, wherein the amino acid sequence of the SoC28C16 oxidase has at least 50% sequence identity to SEQ ID NO: 4; and contacting the echinocystic acid with a Saponaria officinalis C-23 oxidase (SoC23 oxidase) to oxidise the C16 position of echinocystic acid to an aldehyde to form quillaic acid (QA), wherein the amino acid sequence of the SoC23 oxidase having at least 50% sequence identity to SEQ ID NO: 6. A method of the first aspect may comprise; contacting -amyrin with a Saponaria officinalis C28C16 oxidase (SoC28C16 oxidase) to oxidise the C28 position of the -amyrin to a carboxylic acid and the C16 position to an alcohol to form echinocystic acid, wherein the amino acid sequence of the SoC2816 oxidase has at least 50% sequence identity to SEQ ID NO: 4; and contacting the echinocystic acid with a Saponaria officinalis C-23 oxidase (SoC23 oxidase) to oxidise the C16 position of echinocystic acid to an aldehyde to form quillaic acid (QA), wherein the amino acid sequence of the SoC23 oxidase having at least 50% sequence identity to SEQ ID NO: 6. A method of the first aspect may comprise or further comprise; contacting QA with a Saponaria officinalis QA 3- SoCSL to covalent attach D- GlcA to the 3-O position of quillaic acid to form 3-O- -D-glucopyranosiduronic acid}-quillaic acid ( QA-GlcA - ); wherein the amino acid sequence of the SoCSL has at least 60% sequence identity to SEQ ID NO: 10; contacting QA-GlcA with Saponaria officinalis QA-GlcA SoC3Gal to covalently attach D- Gal -1->2 linkage to QA-GlcA to form 3-O- -D-galactopyranosyl-(1- >2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-Gal or QA-Di ; wherein the amino acid sequence of the SoC3Ga has at least 50% sequence identity to SEQ ID NO: 12; and contacting QA-GlcA-Gal with a Saponaria officinalis QA-GlcA-Gal x SoC3Xyl to covalently attach D- Xyl o QA-GlcA-Gal to form 3-O- -D-xylopyranosyl-(1->3)- - D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid ( QA-Tri QA-GlcA-Gal-Xyl ); wherein the amino acid sequence of SoC3Xyl has at least 50% sequence identity to SEQ ID NO: 14. A method of the first aspect may comprise or further comprise; contacting 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid (QA-Tri) with a Saponaria officinalis QA-Tri fucosyl SoC28Fu to attach fucose to the 28-O position of QA-Tri to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- - D-glucopyranosiduronic acid}-28-O- -D-fucopyranosyl ester}-quillaic acid (QA-TriF) wherein the amino acid sequence of SoC28Fu has at least 60% sequence identity to SEQ ID NO: 16; contacting QA-TriF with a Saponaria officinalis QA-TriF rhamnosyl transferase So to covalently attach rhamnose via a 1, 2 linkage to QA-TriF to form 3-O- -D-xylopyranosyl-(1->3)- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -L-rhamnopyranosyl-(1->2)- -D- fucopyranosyl ester}-quillaic acid (QA-TriFR); wherein the amino acid sequence of SoC28Rha has at least 50% sequence identity to SEQ ID NO: 18; contacting QA-TriFR with a Saponaria officinalis QA-TriFR xylosyl transferase C28Xyl1 covalently attach xylose via a 1,4 linkage to QA-TriFR to form 3-O- -D-xylopyranosyl-(1->3)- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->4)- -L- rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRX), wherein the amino acid sequence of SoC28Xyl1has at least 50% sequence identity to SEQ ID NO: 20; contacting QA-TriFRX with a Saponaria officinalis QA-TriFRX-xylosyl transferase SoC28Xyl2 covalently attach xylose via a 1,3 linkage to QA-TriFRX to form 3-O- -D-xylopyranosyl-(1->3)- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl- (1->4)- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRXX, wherein the amino acid sequence of SoC28Xyl2 has at least 50% sequence identity to SEQ ID NO: 22. A method of the first aspect may comprise or further comprise; contacting QA-TriFRXX with a Saponaria officinalis QA- to covalently attach quinovose via a 1,4 linkage to QA-TriFRXX to form 3-O- -D-xylopyranosyl- - -D- galactopyranosyl- - -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl- - -D-xylopyranosyl- - -l-rhamnopyranosyl- - -d-quinovopyranosyl- - -d-fucopyranosyl ester}-quillaic acid (QA- TriF(Q)RXX), wherein the amino acid sequence of SoGH1 has at least 50% sequence identity to SEQ ID NO: 34; and/or contacting QA-TriF(Q)RXX with a Saponaria officinalis QA-TriF(Q)RXX acetyl transferase QA-TriF(Q)RXX to form saponarioside B, wherein the amino acid sequence of SoBAHD1 has at least 50% sequence identity to SEQ ID NO: 36. A second aspect of the invention provides method of converting a host from a phenotype whereby the host is unable to carry out triterpenoid -amyrin to a phenotype whereby the host is able to carry out said triterpenoid biosynthesis, the method comprising; expressing a heterologous nucleic acid within the host or one or more cells thereof, following an earlier step of introducing the nucleic acid into the host or an ancestor of either, wherein the heterologous nucleic acid encodes one or more or all of the following polypeptides (i) a SoC28 oxidase -amyrin at the C28 SoC28 said SoC28 oxidase having at least 80% sequence identity to SEQ ID NO:2; (ii) a SoC28C16 oxidase -amyrin at the C28 position to a carboxylic acid and at the C16 SoC28 said C28C16 oxidase having at least 50% sequence identity to SEQ ID NO: 4; and (iii) a SoC23 oxidase capable of oxidising echinocystic acid at the C-23 position to an aldehyde ( SoC23 oxidase , SoC23 oxidase having at least 50% sequence identity to SEQ ID NO: 6, (iv) a Saponaria officinalis QA 3- SoCSL capable of attaching D- GlcA to the 3-O position of quillaic acid to form 3-O- -D-glucopyranosiduronic acid}- quillaic acid -Mono -GlcA ; said SoCSL having at least 60% sequence identity to SEQ ID NO: 10; (v) Saponaria officinalis QA-GlcA SoC3Gal capable of attaching D- Gal -1->2 linkage to QA-GlcA to form 3-O- -D-galactopyranosyl-(1->2)]- -D- glucopyranosiduronic acid}-quillaic acid -Di QA-GlcA-Gal ; wherein the amino acid sequence of the SoC3Gal has at least 50% sequence identity to SEQ ID NO: 12; (vi) a Saponaria officinalis QA-GlcA-Gal x SoC3Xyl capable of attaching D- Xyl QA-GlcA-Gal to form 1, 3-O- -D-xylopyranosyl-(1->3)- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-Gal-Xyl QA-Tri ), wherein the amino acid sequence of SoC3Xyl has at least 50% sequence identity to SEQ ID NO: 14; (vii) a Saponaria officinalis QA-Tri fucosyl transferase SoC28Fu capable of attaching fucose Fuc to the 28-O position of QA-Tri to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- - D-glucopyranosiduronic acid}-28-O- -D-fucopyranosyl ester}-quillaic acid (QA-TriF), wherein the amino acid sequence of SoC28Fu has at least 50% sequence identity to SEQ ID NO: 16; (viii) a Saponaria officinalis QA-TriF rhamnosyl transferase SoC28Rha capable of attaching rhamnose Rha 1, 2 linkage to QA-TriF to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl- (1->2)]- -D-glucopyranosiduronic acid}-28-O- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFR); said SoC28Rha having at least 50% sequence identity to SEQ ID NO: 18; (ix) Saponaria officinalis QA-TriFR xyl SoC28Xyl1 capable of attaching D-Xylose Xyl 4 linkage to QA-TriFR to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- - D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->4)- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRX) ; wherein the amino acid sequence of the C28Xyl1 has at least 50% sequence identity to SEQ ID NO: 20; (x) a Saponaria officinalis QA-TriFRX x C28Xyl2 for attachment of D-Xylose Xyl QA-TriFRX to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- - D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl-(1->4)- -L-rhamnopyranosyl- (1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRXX), wherein the amino acid sequence of C28Xyl2 has at least 50% sequence identity to SEQ ID NO: 22, (xi) a Saponaria officinalis QA- for attachment of quinovose via a 1, 4 linkage to QA-TrFRXX to form 3-O- -D-xylopyranosyl- - -D-galactopyranosyl- - -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl- - -D-xylopyranosyl- - -L- rhamnopyranosyl- - -dD-quinovopyranosyl- - -dD-fucopyranosyl ester}-quillaic acid (QA- TriF(Q)RXX), wherein the amino acid sequence of SoGH1 has at least 50% sequence identity to SEQ ID NO: 34, and/or (xii) a Saponaria officinalis QA- acetyl group to QA-TriF(Q)RXX to form saponarioside B, wherein the amino acid sequence of SoBAHD1 has at least 50% sequence identity to SEQ ID NO: 36. The heterologous nucleic acid in methods of the second aspect may encode the following polypeptides; (i) a SoC28 oxidase -amyrin at the C28 position to a carboxylic acid to form oleanolic acid; said SoC28 oxidase having at least 80% sequence identity to SEQ ID NO: 2; (ii) a SoC28C16 oxidase capable of oxidising -amyrin at the C28 position to a carboxylic acid and the C16 C28 to form echinocystic acid; said SoC28C16 oxidase having at least 50% sequence identity to SEQ ID NO: 4; and (iii) a SoC23 oxidase capable of oxidising echinocystic acid at the C-23 position to an aldehyde to form quillaic acid (QA), said SoC23 oxidase having at least 50% sequence identity to SEQ ID NO: 6. The heterologous nucleic acid in methods of the second aspect may further encode the following polypeptides; (iv) a Saponaria officinalis QA 3- SoCSL for attachment of D- glu GlcA to the 3-O position of quillaic acid to form 3-O- -D-glucopyranosiduronic acid}- quillaic acid QA-GlcA ; wherein the amino acid sequence of the SoCSL has at least 60% sequence identity to SEQ ID NO: 10; (v) Saponaria officinalis QA-GlcA SoC3Gal for attachment D-Galactose Gal -1->2 linkage to QA-GlcA to form 3-O- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-Gal ; wherein the amino acid sequence of the SoC3Gal has at least 50% sequence identity to SEQ ID NO: 12; and (vi) a Saponaria officinalis QA-GlcA-Gal x SoC3Xyl for attachment of D-Xylose Xyl QA-GlcA-Gal to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-Gal-Xyl QA-Tri ), wherein the amino acid sequence of SoC3Xyl has at least 50% sequence identity to SEQ ID NO: 14. The heterologous nucleic acid in methods of the second aspect may further encode the following polypeptides; (vii) a Saponaria officinalis QA-Tri fucosyl transferase SoC28Fu for the attachment of fucose Fuc to the 28-O position of QA-Tri to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1- >2)]- -D-glucopyranosiduronic acid}-28-O- -D-fucopyranosyl ester}-quillaic acid(QA-TriF); said SoC28Fu having at least 60% sequence identity to SEQ ID NO: 16; (viii) a Saponaria officinalis QA-TriF rhamnosyl transferase SoC28Rha for the attachment of rhamnose Rha 1, 2 linkage to QA-TriF to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl- (1->2)]- -D-glucopyranosiduronic acid}-28-O- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFR); said SoC28Rha having at least 50% sequence identity to SEQ ID NO: 18; (ix) Saponaria officinalis QA-TriFR xyl SoC28Xyl1 for attachment of D-Xylose Xyl 4 linkage to QA-TriFR to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- - D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->4)- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRX); wherein the amino acid sequence of the SoC28Xyl1 has at least 50% sequence identity to SEQ ID NO: 20; and (x) a Saponaria officinalis QA-TriFRX x SoC28Xyl2 for attachment of D-Xylose Xyl QA-TriFRX to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- - D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl-(1->4)- -L-rhamnopyranosyl- (1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRXX); wherein the amino acid sequence of SoC28Xyl2 has at least 80% sequence identity to SEQ ID NO:22. The heterologous nucleic acid in methods of the second aspect may further encode the following polypeptides; (xi) a Saponaria officinalis QA- quinovose (Q) via a 1, 4 linkage to QA-TrFRXX to form 3-O- -D-xylopyranosyl- - -D-galactopyranosyl- - -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl- - -D-xylopyranosyl- - -L- rhamnopyranosyl- - -D-quinovopyranosyl- - -D-fucopyranosyl ester}-quillaic acid (QA- TriF(Q)RXX), wherein the amino acid sequence of SoGH1 has at least 50% sequence identity to SEQ ID NO: 34, and/or (xii) a Saponaria officinalis QA- acetyl group to QA-TriF(Q)RXX to form saponarioside B, wherein the amino acid sequence of SoBAHD1 has at least 50% sequence identity to SEQ ID NO: 36. A third aspect of the invention provides host cell containing or transformed with a heterologous nucleic acid which comprises a plurality of nucleotide sequences, each of which encodes a polypeptide which in combination have triterpenoid biosynthesis activity, wherein the plurality of nucleotide sequences encode one or more of following polypeptides (i) a SoC28 oxidase -amyrin at the C28 position to a carboxylic acid ( C28 oxidase said SoC28 oxidase having at least 80% sequence identity to SEQ ID NO: 2; (ii) a SoC28C16 oxidase -amyrin at the C28 position to a carboxylic acid and the C16 C28C1 said C28C16 oxidase having at least 50% sequence identity to SEQ ID NO: 4; and (iii) a SoC23 oxidase capable of oxidising echinocystic acid at the C-23 position to an aldehyde ( SoC23 oxidase said SoC23 oxidase having at least 50% sequence identity to SEQ ID NO: 6; (iv) a Saponaria officinalis QA 3- SoCSL for attachment of D- GlcA to the 3-O position of quillaic acid to form 3-O- -D-glucopyranosiduronic acid}- quillaic acid QA-GlcA ; said SoQA-GlcT having at least 60% sequence identity to SEQ ID NO: 10 ; (v) Saponaria officinalis QA-GlcA SoC3Gal for attachment D-Galactose Gal -1->2 linkage to QA-GlcA to form 3-O- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-Gal ; wherein the amino acid sequence of the SoC3Gal has at least 50% sequence identity to SEQ ID NO: 12; and (vi) a Saponaria officinalis QA-GlcA-Gal xylosyl transf SoC3Xyl for attachment of D-Xylose Xyl QA-GlcA-Gal to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-Gal-Xyl QA-Tri ), wherein the amino acid sequence of SoC3Xyl has at least 50% sequence identity to SEQ ID NO: 14; (vii) a Saponaria officinalis QA-Tri fucosyl transferase SoC28Fu for the attachment of fucose Fuc to the 28-O position of QA-Tri to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1- >2)]- -D-glucopyranosiduronic acid}-28-O- -D-fucopyranosyl ester}-quillaic acid(QA-TriF); said SoC28Fu having at least 50% sequence identity to SEQ ID NO: 16; (viii) a Saponaria officinalis QA-TriF rhamnosyl transferase SoC28Rha for the attachment of rhamnose Rha 1, 2 linkage to QA-TriF to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl- (1->2)]- -D-glucopyranosiduronic acid}-28-O- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFR); said SoC28Rha having at least 50% sequence identity to SEQ ID NO: 18; (ix) Saponaria officinalis QA-TriFR xyl SoC28Xyl1 for attachment of D-Xylose Xyl 4 linkage to QA-TriFR to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- - D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->4)- -L-rhamnopyranosyl-(1->2)- -D- fucopyranosyl ester}-quillaic acid (QA-TriFRX) ; wherein the amino acid sequence of the SoC28Xyl1 has at least 50% sequence identity to SEQ ID NO: 20; (x) a Saponaria officinalis QA-TriFRX x SoC28Xyl2 for attachment of D-Xylose Xyl QA-TriFRX to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- - D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl-(1->4)- -L- rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRXX), wherein the amino acid sequence of SoC28Xyl2 has at least 50% sequence identity to SEQ ID NO: 22 (xi) a Saponaria officinalis QA- quinovose (Q) via a 1, 4 linkage to QA-TrFRXX to form 3-O- -D-xylopyranosyl- - -D-galactopyranosyl- - -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl- - -D-xylopyranosyl- - -L- rhamnopyranosyl- - -D-quinovopyranosyl- - -D-fucopyranosyl ester}-quillaic acid (QA- TriF(Q)RXX), wherein the amino acid sequence of SoGH1 has at least 50% sequence identity to SEQ ID NO: 34, and/or (xii) a Saponaria officinalis QA- acetyl group to QA-TriF(Q)RXX to form saponarioside B, wherein the amino acid sequence of SoBAHD1 has at least 50% sequence identity to SEQ ID NO: 36. The plurality of nucleotide sequences in host cells of the third aspect may encode the following polypeptides; (i) a SoC28 oxidase -amyrin thereof at the C28 position to a carboxylic acid to form oleanolic acid; said SoC28 oxidase having at least 80% sequence identity to SEQ ID NO: 2; (ii) a SoC28C16 oxidase -amyrin at the C28 position to a carboxylic acid and/or the C16 position to an al C28 and (iii) a SoC23 oxidase capable of oxidising echinocystic acid at the C-23 position to an aldehyde to form quillaic acid (QA), said SoC23 oxidase having at least 50% sequence identity to SEQ ID NO: 6. The plurality of nucleotide sequences in host cells of the third aspect may further encode the following polypeptides; (iv) a Saponaria officinalis QA 3- SoCSL for attachment of D- GlcA to the 3-O position of quillaic acid to form 3-O- -D-glucopyranosiduronic acid}- quillaic acid QA-GlcA ; said SoQA-GlcT having at least 60% sequence identity to SEQ ID NO: 10; (v) Saponaria officinalis QA-GlcA SoC3Gal for attachment D-Galactose Gal -1->2 linkage to QA-GlcA to form 3-O- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-Gal ; wherein the amino acid sequence of the QA-GlcA-Gal has at least 50% sequence identity to SEQ ID NO: 12; and (vi) a Saponaria officinalis QA-GlcA-Gal x SoC3Xyl for attachment of D-Xylose Xyl QA-GlcA-Gal to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-Gal-Xyl QA-Tri), wherein the amino acid sequence of SoC3Xyl has at least 50% sequence identity to SEQ ID NO: 14. The plurality of nucleotide sequences in host cells of the third aspect may further encode the following polypeptides; (vii) a Saponaria officinalis QA-Tri fucosyl transferase SoC28Fu for the attachment of fucose Fuc to the 28-O position of QA-Tri to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1- >2)]- -D-glucopyranosiduronic acid}-28-O- -D-fucopyranosyl ester}-quillaic acid (QA-TriF); said SoC28Fu having at least 60% sequence identity to SEQ ID NO: 16; (viii) a Saponaria officinalis QA-TriF rhamnosyl transferase SoC28Rha for the attachment of rhamnose Rha 1, 2 linkage to QA-TriF to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl- (1->2)]- -D-glucopyranosiduronic acid}-28-O- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFR); said SoC28Rha having at least 50% sequence identity to SEQ ID NO: 18; (ix) Saponaria officinalis QA-TriFR xyl SoC28Xyl1 for attachment of D-Xylose Xyl 4 linkage to QA-TriFR to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- - D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->4)- -L-rhamnopyranosyl-(1->2)- -D- fucopyranosyl ester}-quillaic acid (QA-TriFRX); wherein the amino acid sequence of the SoC28Xyl1 has at least 50% sequence identity to SEQ ID NO: 20; and (x) a Saponaria officinalis QA-TriFRX x SoC28Xyl2 for attachment of D-Xylose Xyl QA-TriFRX to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- - D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl-(1->4)- -L- rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRXX); wherein the amino acid sequence of SoC28Xyl2 has at least 80% sequence identity to SEQ ID NO:22. The plurality of nucleotide sequences in host cells of the third aspect may further encode the following polypeptides; (xi) a Saponaria officinalis QA- quinovose (Q) via a 1, 4 linkage to QA-TrFRXX to form 3-O- -d-xylopyranosyl- - -d-galactopyranosyl- - -d-glucopyranosiduronic acid}-28-O- -d-xylopyranosyl- - -d-xylopyranosyl- - -l- rhamnopyranosyl- - -d-quinovopyranosyl- - -d-fucopyranosyl ester}-quillaic acid (QA- TriF(Q)RXX), wherein the amino acid sequence of SoGH1 has at least 50% sequence identity to SEQ ID NO: 34, and/or (xii) a Saponaria officinalis QA- acetyl group to QA-TriF(Q)RXX to form saponarioside B, wherein the amino acid sequence of SoBAHD1 has at least 50% sequence identity to SEQ ID NO: 36. A fourth aspect of the invention provides a method of producing a host cell comprising transforming or transfecting a host cell with a heterologous nucleic acid which comprises a plurality of nucleotide sequences as set out in the second and third aspects. A fifth aspect provides a process for producing a transgenic plant which method comprises the steps of: (a) performing a method of the fourth aspect, wherein the host cell is a plant cell, and (b) regenerating a plant from the transformed plant cell. A sixth aspect provides a transgenic plant which is obtainable by the method of the fifth aspect, or which is a clone, or selfed or hybrid progeny or other descendant of said transgenic plant, wherein expression of said heterologous nucleic acid imparts an increased ability to carry out the biosynthesis compared to a wild-type plant otherwise corresponding to said transgenic plant. A seventh aspect provides a method of producing a triterpenoid in a heterologous host, which method comprises culturing a host cell as set out in the third aspect and purifying the triterpenoid therefrom. An eighth aspect provides a method of producing a triterpenoid in a heterologous host, which method comprises growing a plant of the sixth aspect and then harvesting it and purifying the triterpenoid therefrom. A triterpenoid of the seventh and eighth aspects may be QA or a glycosylated QA, such as QA-Tri, QA- TriFRXX or QA-F(Q-Ac)RXX or an intermediate or derivative thereof The SobAS, SoC28 oxidase, SoC23 oxidase, SoC28C16 oxidase, SoCSL, SoC3Gal, SoC3Xyl, SoC28Fu, SoC28Rha, SoC28Xyl1, SoC28Xyl2, SoGH1 and SoBAHD1 of the first to the eighth aspects may be obtained or derived from Saponaria officinalis. Other aspects and embodiments of the invention are described in more detail below. Brief Description of the Figures Figure 1 shows the chemical structure of (A) saponariosides A/B from S. officinalis and (B) QS-21 from Q. saponaria. Figure 2 shows the predicted biosynthetic pathway of saponariosides A/B. (A) Biosynthesis of the aglycone quillaic acid from 2,3-oxidosqualene. (B) Predicted order of quillaic acid decoration. QA, quillaic acid; F, fucose; Rh, rhamnose; X, xylose; Q, quinovose; A, acetyl moiety; GA, glucuronic acid; Gal, galactose; Ar, arabinose. (C) Simplified chemical structures of saponarioside A and B. GlcA, glucuronic acid; Xyl, xylose; Gal, galactose; Fuc, fucose; Rha, rhamnose; Qui, quinovose; Acyl, acetyl moiety. Figure 3 shows the expression profile of candidate soapwort genes across different soapwort organs. SobAS1 was used to identify candidate genes of interest. The heatmap shows the raw RNA-Seq read counts normalized to the library size and rlog-transformed. The functional soapwort genes are labelled in bold and absolute transcript read counts of candidate genes are also shown. Figure 4 shows the characterization of SobAS. N. benthamiana leaves transiently expressing AstHMGR and SobAS were extracted and analysed using GC/MS. (A) The total ion chromatograms (TIC) and the mass spectra are shown. The extract from N. benthamiana leaves expressing only AstHMGR was used as a negative control. -amyrin peak was identified by comparison with commercial standard. -amyrin. (B) The function of SobAS1. Figure 5 shows the characterization of SoCYP716A378, SoCYP716A379 and SoCYP72A984. (A) Pathway depicting the functions of SobAS1, SoCYP716A378, SoCYP716A379 and SoCYP72A984 in converting oxidosqualene to Quillaic acid (4). N. benthamiana leaves transiently expressing various combinations of CYPs genes were extracted and analysed using GC-MS or HPLC-MS. The extract from N. benthamiana leaves expressing only AstHMGR was used as a negative control (control) and highlighted peaks were identified by comparison with commercial standards ( -amyrin (1), oleanolic acid (2), echinocystic acid (3) and quillaic acid (4)). (B) GC-MS total ion chromatograms (TIC) and relevant mass spectra for N. benthamiana leaves transiently co-expressing tHMGR and SobAS1 with either SoCYP716A378 or SoCYP716A379. The activity of SoCYP716A378 produced a single peak of (2) while the activity of SoCYP716A379 produced an additional peak of (3). (C) HPLC-MS extracted ion chromatograms (EIC) and relevant mass spectra for N. benthamiana leaves transiently co-expressing tHMGR, SobAS1 and SoCYP716A379, with and without SoCYP72A984. EIC displayed are for m/z 485.3267, the calculated mass of [M-H]- adduct of quillaic acid (4). The additional activity of SoCYP72A984 produced a major new peak corresponding to (4) Figure 6 shows the characterization of SoCSL1. (A) Structure of 3-O- -D-glucopyranosiduronic acid}-quillaic acid (QA-Mono, (5)), the product of SoCSL1 when acting in combination with the S. officinalis enzymes required for production of quillaic acid (QA, (4)). Modification performed by SoCSL1 has been highlighted and a table showing relevant calculated adducts and fragments of (5) included. (B) N. benthamiana leaves transiently co-expressing various genes were extracted and analysed using HPLC-MS, representative extracted ion chromatograms (EIC) and MS/MS spectra are shown. EIC displayed are for m/z 661.3588, the calculated mass of the [M-H]- adduct of (5). The negative controls used were extracts from N. benthamiana leaves co-expressing only AstHMGR (tHMGR control) or co-expressing the S. officinalis genes required to produce (4) (tHMGR, SobAS1, SoCYP716A379 and SoCYP72A984) (QA). The additional activity of SoCSL1 produced a peak corresponding to (5), identified by comparison with an authentic standard. Figure 7 shows the characterization of SoUGT73DL1. (A) Structure of 3-O- -D-galactopyranosyl-(1->2)]- - D-glucopyranosiduronic acid}-quillaic acid (QA-Di, (6)), the product of SoUGT73DL1 when acting in combination with the S. officinalis enzymes required for production of 3-O- -D-glucopyranosiduronic acid}- quillaic acid (QA-mono, (5)). Modification performed by SoUGT73DL1 has been highlighted and a table showing relevant calculated adducts and fragments of (6) included. (B) N. benthamiana leaves transiently co-expressing various genes were extracted and analysed using HPLC-MS, representative (n=6) extracted ion chromatograms (EIC) and MS/MS spectra are shown. EIC displayed are for m/z 823.4116, the calculated mass of the [M-H]- adduct of (6). The negative controls used were extracts from N. benthamiana leaves co- expressing only AstHMGR (tHMGR control) or co-expressing the S. officinalis genes required to produce (5) (tHMGR, SobAS1, SoCYP716A379, SoCYP72A984 and SoCSL1) (QA-mono). The additional activity of SoUGT73DL1 produced a peak corresponding to (6), identified by comparison with an authentic standard. Figure 8 shows the characterization of SoUGT73CC6. (A) Structure of 3-O- -D-xylopyranosyl-(1->3)- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid (QA-Tri, (7)), the product of SoUGT73CC6 when acting in combination with the S. officinalis enzymes required for production of 3-O- - D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid (QA-Di, (6)). Modification performed by SoUGT73CC6 has been highlighted and a table showing relevant calculated adducts and fragments of (7) included. (B) N. benthamiana leaves transiently co-expressing various genes were extracted and analysed using HPLC-MS, representative (n=6) extracted ion chromatograms (EIC) and MS/MS spectra are shown. EIC displayed are for m/z 955.4539, the calculated mass of the [M-H]- adduct of (7). The negative controls used were extracts from N. benthamiana leaves co-expressing only AstHMGR (tHMGR control) or co- expressing the S. officinalis genes required to produce (6) (tHMGR, SobAS1, SoCYP716A379, SoCYP72A984, SoCSL and SoUGT73DL1) (QA-Di). The additional activity of SoUGT73CC6 produced a peak corresponding to (7), identified by comparison with an authentic standard. Figure 9 shows the characterization of SoUGT74CD1 and SoSDR. (A) Structure of 3-O- -D-xylopyranosyl- (1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-fucopyranosyl ester}-quillaic acid (QA-TriF, (8)), the product of SoUGT74CD1 when acting in combination with the S. officinalis enzymes required for production of 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D- glucopyranosiduronic acid}-quillaic acid (QA-Tri, (7)). Modification performed by SoUGT74CD1 has been highlighted and a table showing relevant calculated adducts and fragments of (8) included. (B) N. benthamiana leaves transiently co-expressing various genes were extracted and analysed using HPLC-MS, representative (n=6) extracted ion chromatograms (EIC) and MS/MS spectra are shown. EIC displayed are for m/z 1101.5118, the calculated mass of the [M-H]- adduct of (8). The negative controls used were extracts from N. benthamiana leaves co-expressing only AstHMGR (tHMGR control) or co-expressing the S. officinalis genes required to produce (7) (tHMGR, SobAS1, SoCYP716A379, SoCYP72A984, SoCSL, SoUGT73DL1 and SoUGT73CC6) (QA-Tri). The additional activity of SoUGT74CD1 produced a peak corresponding to (8), identified by comparison with an authentic standard. Two additional smaller peaks (8’) and (8’’), were also observed with the addition of SoUGT74CD1. Addition of SoSDR in combination with SoUGT74CD1 increased yields of peak (8) significantly but was not capable of producing (8) in the absence of SoUGT74CD1. Figure 10 shows the characterization of SoUGT79T1. (A) Structure of 3-O- -D-xylopyranosyl-(1->3)- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -L-rhamnopyranosyl-(1->2)- -D- fucopyranosyl ester}-quillaic acid (QA-TriFR, (9)), the product of SoUGT79T1 when acting in combination with the S. officinalis enzymes required for production of 3-O- -D-xylopyranosyl-(1->3)- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-fucopyranosyl ester}-quillaic acid (QA- TriF, (8)). Modification performed by SoUGT79T1 has been highlighted and a table showing relevant calculated adducts and fragments of (9) included. (B) N. benthamiana leaves transiently co-expressing various genes were extracted and analysed using HPLC-MS, representative (n=6) extracted ion chromatograms (EIC) and MS/MS spectra are shown. EIC displayed are for m/z 1247.5679, the calculated mass of the [M-H]- adduct of (9). The negative controls used were extracts from N. benthamiana leaves co- expressing only AstHMGR (tHMGR control) or co-expressing the S. officinalis genes required to produce (8) (tHMGR, SobAS1, SoCYP716A379, SoCYP72A984, SoCSL, SoUGT73DL1, SoUGT73CC6 and SoUGT74CD1) (QA-TriF). The additional activity of SoUGT79T1 produced a peak corresponding to (9) based on MSMS spectra, and later confirmed by comparison of downstream products to authentic standards. Figure 11 shows the characterization of SoUGT79L3. (A) Structure of 3-O- -D-xylopyranosyl-(1->3)- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->4)- -L- rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRX, (10)), the product of SoUGT79L3 when acting in combination with the S. officinalis enzymes required for production of 3-O- -D-xylopyranosyl- (1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -L-rhamnopyranosyl-(1->2)- -D- fucopyranosyl ester}-quillaic acid (QA-TriFR, (9)). Modification performed by SoUGT79L3 has been highlighted and a table showing relevant calculated adducts and fragments of (10) included. (B) N. benthamiana leaves transiently co-expressing various genes were extracted and analysed using HPLC-MS, representative (n=6) extracted ion chromatograms (EIC) and MS/MS spectra are shown. EIC displayed are for m/z 1379.6119, the calculated mass of the [M-H]- adduct of (10). The negative controls used were extracts from N. benthamiana leaves co-expressing only AstHMGR (tHMGR control) or co-expressing the S. officinalis genes required to produce (9) (tHMGR, SobAS1, SoCYP716A379, SoCYP72A984, SoCSL, SoUGT73DL1, SoUGT73CC6, SoUGT74CD1 and SoUGT79T1) (QA-TriFR). The additional activity of SoUGT79L3 produced a peak corresponding to (10) based on MSMS spectra, and later confirmed by comparison of downstream products to authentic standards. Figure 12 shows the characterization of SoUGT73M2. (A) Structure of 3-O- -D-xylopyranosyl-(1->3)- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl- (1->4)- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRXX, (11)), the product of SoUGT73M2 when acting in combination with the S. officinalis enzymes required for production of 3-O- -D- xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl- (1->4)- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRX, (10)). Modification performed by SoUGT73M2 has been highlighted and a table showing relevant calculated adducts and fragments of (11) included. (B) N. benthamiana leaves transiently co-expressing various genes were extracted and analysed using HPLC-MS, representative (n=6) extracted ion chromatograms (EIC) and MS/MS spectra are shown. EIC displayed are for m/z 1511.6542, the calculated mass of the [M-H]- adduct of (11). The negative controls used were extracts from N. benthamiana leaves co-expressing only AstHMGR (tHMGR control) or co-expressing the S. officinalis genes required to produce (10) (tHMGR, SobAS1, SoCYP716A379, SoCYP72A984, SoCSL, SoUGT73DL1, SoUGT73CC6, SoUGT74CD1, SoUGT79T1 and SoUGT79L3) (QA-TriFRX). The additional activity of SoUGT73M2 produced a peak corresponding to (11) based on MSMS spectra, and later confirmed by comparison of downstream products to authentic standards. Figure 13 shows the characterization of SoGH1. (A) Structure of 3-O- -d-xylopyranosyl- - -d- galactopyranosyl- - -d-glucopyranosiduronic acid}-28-O- -d-xylopyranosyl- - -d-xylopyranosyl- - -l-rhamnopyranosyl- - -d-quinovopyranosyl- - -d-fucopyranosyl ester}-quillaic acid (QA- TriF(Q)RXX, (12)), the product of SoGH1 when acting in combination with the S. officinalis enzymes required for production of 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl-(1->4)- -L-rhamnopyranosyl-(1->2)- -D- fucopyranosyl ester}-quillaic acid (QA-TriFRXX, (11)). Modification performed by SoGH1 has been highlighted and a table showing relevant calculated adducts and fragments of (12) included. (B) N. benthamiana leaves transiently co-expressing various genes were extracted and analysed using HPLC-MS, representative (n=6) extracted ion chromatograms (EIC) and MS/MS spectra are shown. EIC displayed are for m/z 1657.7115, the calculated mass of the [M-H]- adduct of (12). The negative controls used were extracts from N. benthamiana leaves co-expressing only AstHMGR (tHMGR control) or co-expressing the S. officinalis genes required to produce (11) (tHMGR, SobAS1, SoCYP716A379, SoCYP72A984, SoCSL, SoUGT73DL1, SoUGT73CC6, SoUGT74CD1, SoUGT79T1, SoUGT79L3 and SoUGT73M2) (QA-TriFRXX). The additional activity of SoGH1 produced a peak corresponding to (12) confirmed by comparison to authentic standards. Figure 14 shows the characterization of SoBHAD1. (A) Structure of saponarioside B (13), the product of SoBAHD1 when acting in combination with the S. officinalis enzymes 3-O- -d-xylopyranosyl- - -d- galactopyranosyl- - -d-glucopyranosiduronic acid}-28-O- -d-xylopyranosyl- - -d-xylopyranosyl- - -l-rhamnopyranosyl- - -d-quinovopyranosyl- - -d-fucopyranosyl ester}-quillaic acid (QA- TriF(Q)RXX, (12)). Modification performed by SoBAHD1 has been highlighted and a table showing relevant calculated adducts and fragments of (13) included. (B) N. benthamiana leaves transiently co-expressing various genes were extracted and analysed using HPLC-MS, representative (n=6) extracted ion chromatograms (EIC) and MS/MS spectra are shown. EIC displayed are for m/z 1699.7206, the calculated mass of the [M-H]- adduct of (13). The negative controls used were extracts from N. benthamiana leaves co- expressing only AstHMGR (tHMGR control) or co-expressing the S. officinalis genes required to produce (12) (tHMGR, SobAS1, SoCYP716A379, SoCYP72A984, SoCSL, SoUGT73DL1, SoUGT73CC6, SoUGT74CD1, SoUGT79T1, SoUGT79L3, SoUGT73M2 and SoGH1) (QA-TriF(Q)RXX). The additional activity of SoBAHD1 produced a peak corresponding to (13) confirmed by comparison to authentic standards. Figure 15 shows the biosynthesis of saponarioside B. (A) Bar charts displaying the relative accumulation of (4-13) in N. benthamiana with the stepwise expression of each enzyme in the pathway. Relative accumulation is based on the integrated peak area of extracted ion chromatograms. Mean values are plotted and error bars representative of the mean, n=6. (B) Proposed biosynthetic pathway converting oxidosqualene to saponarioside B (a precursor to saponarioside A, and isomer of SO1861). Structures confirmed by comparison to authentic standards are indicated with a black circle. Order of steps drawn is only proposed, the actual order in planta may vary. Figure 16 shows the limited detection of -amyrin (1) in both control DsRED and SobAS1 silenced hairy roots. The samples were extracted and analysed using GC/MS and the extraction ion chromatogram (EIC) at m/z 218 is shown. Commercial -amyrin standard was used as comparison. Figure 17 shows the GC/MS analysis of soapwort hairy root samples in comparison with cycloartenol standard. The extracted ion chromatogram (EIC) at m/z 408 and the MS of the detected cycloartenol peaks are shown. Figure 18 shows the LC/MS analysis of soapwort hairy root samples in comparison with quillaic acid (4) standard. The extracted ion chromatogram (EIC) at m/z 485.3267 and the MS/MS fragmentation of detected peaks are shown. Figure 19 shows the LC/MS analysis of soapwort hairy root samples in comparison with saponarioside B (13) standard. The extracted ion chromatogram (EIC) at m/z 1699.7206 and the MS/MS fragmentation of detected peaks are shown. Detailed Description This invention relates to the production of triterpenoids, such as saponariosides and intermediates thereof, using biosynthetic enzymes encoded by newly characterised or identified genes from the Soapwort plant (Saponaria officinalis) and variants thereof. These enzymes may include -amyrin synthase (bAS; SobAS; SEQ ID NO: 8), SoC28 oxidase (SoC28; SEQ ID NO: 2), SoC23 oxidase (SoC23; SEQ ID NO: 4), C28C16 oxidase (SoC28C16; SEQ ID NO: 6), QA 3-O glucuronosyl transferase (SoQA-GlcAT; SoCSL; SEQ ID NO: 10), QA-GlcA galactosyl transferase (SoC3Gal; SoC3Gal; SEQ ID NO: 12), QA-GlcA-Gal xylosyl transferase (SoQA-R XylT; SoC3Xyl; SEQ ID NO: 14), QA-Tri fucosyl transferase (QATriFuT; SoC28F; SEQ ID NO: 16), QA-TriF rhamnosyl transferase (QA-TriFR; SoC28Rha; SEQ ID NO: 18), QA-TriFR xylosyl transferase (SoQA-TriFRXylT; SoC28Xyl1; SEQ ID NO: 20), QA-TriFRX xylosyl transferase (SoQA-TriFRXXylT; SoC28Xyl2; SEQ ID NO: 22), QA-TriFRXX quinovosyl transferase (SoGH1; SEQ ID NO: 34) and/or QA- TriF(Q)RXX acetyl transferase (SoBAHD1; SEQ ID NO: 36). Each of the genes, polypeptide sequences and nucleotide sequences described herein is optionally obtained or derived from S officinalis. The genes polypeptide sequences and nucleotide sequences described herein may be useful in the production of cyclic triterpenes, such as -amyrin, oleanolic acid, echinocystic acid, and glycosylated forms of QA, such as saponariosides, QS-7, QS-21 and analogues and intermediates of these glycosylated forms of QA. In some embodiments, one, two, three, four or more genes described herein may be useful in the production of quillaic acid (QA). QA -amyrin, which is in turn synthesised by cyclisation of the universal linear precursor 2,3-oxidosqualene (OS) by oxidosqualene cyclases (OSCs). The -amyrin scaffold is further oxidised with an alcohol, aldehyde and carboxylic acid at the C16, C-23 and C28 positions, respectively, to form QA. A proposed linear biosynthetic pathway is shown in Figure 15, although the three oxidation reactions may equally occur in a different order, via the corresponding intermediates. In preferred embodiments, QA may be produced from OS using genes encoding biosynthetic enzymes as set out below. 2,3-oxidosqualene (OS) may be converted into -amyrin using Saponaria officinalis -amyrin synthase (SobAS). SobAS may have the amino acid sequence of SEQ ID NO: 8 or may be a variant or fragment thereof. Alternatively, 2,3-oxidosqualene (OS) may be converted into -amyrin by an endogenous enzyme in a host cell. The C28 position of -amyrin may be oxidised to a carboxylic acid to produce oleanolic acid using a SoC28 oxidase (SoC28). SoC28 oxidase may have the amino acid sequence of SEQ ID NO: 2 or may be a variant or fragment thereof. The C16 position of oleanolic acid may then be oxidised to an alcohol to produce echinocystic acid using a Saponaria officinalis C28C16 oxidase (SoC28C16). C28C16 oxidase may have the amino acid sequence of SEQ ID NO: 4 or may be a variant or fragment thereof. Alternatively, the C28 position of -amyrin may be oxidised to a carboxylic acid and the C16 position may be oxidised to an alcohol to produce echinocystic acid using a Saponaria officinalis C28C16 oxidase (SoC28C16). SoC28C16 oxidase may have the amino acid sequence of SEQ ID NO: 4 or may be a variant or fragment thereof. The C-23 position of echinocystic acid may be oxidised to an aldehyde to produce QA using a SoC23 oxidase (SoC23). SoC23 oxidase may have the amino acid sequence of SEQ ID NO: 6 or may be a variant or fragment thereof. In some embodiments, genes described herein may be useful in the glycosylation of the C3 position of QA. -D-glucuronic acid (GlcA) residue attached at the 3-O position of QA. The GlcA residue is then linked to a D-Galactose (Gal -1->2 linkage and to a D-Xylose (Xyl - 1,3 linkage. In preferred embodiments, QA or C28 glycosylated forms of QA may be glycosylated at the 3-O position using genes encoding biosynthetic enzymes as set out below. D-Glucuronic GlcA may be transferred to the 3-O position of quillaic acid to form 3-O- -D- glucopyranosiduronic acid}-quillaic acid QA-GlcA - ) using a Saponaria officinalis QA 3-O glucuronosyl transferase ( SoQA-GlcAT; SoCSL). The SoCSL may have the amino acid sequence of SEQ ID NO: 10 or may be a variant or fragment thereof. D- Gal may be transferred -1->2 linkage to QA mono (QA-GlcA) to form 3-O- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-Gal or QA-Di) using a Saponaria officinalis QA-GlcA SoQA-GalT or SoC3Gal). SoC3Gal may have the amino acid sequence of SEQ ID NO: 12 or may be a variant or fragment thereof. D- Xyl may be transferred via a 1->3 linkage to QA-Di to form 3-O- -D-xylopyranosyl-(1->3)- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-[Gal]-Xyl or QA-Tri) using a Saponaria officinalis QA-GlcA-Gal - or SoC3Xyl). The QA- XylT may have the amino acid sequence of SEQ ID NO: 14 or may be a variant or fragment thereof. In some embodiments, genes described herein may be useful in the glycosylation of the C28 position of QA or C-3 glycosylated forms of QA. D-Fucose Fuc may be transferred to the 28-O position of QA-Tri to form 3-O- -D-xylopyranosyl-(1->3)- - D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-fucopyranosyl ester}-quillaic acid(QA- TriF) using a Saponaria officinalis QA-Tri fucosyl transferase SoQA-TriFuT SoC28Fu ). SoC28Fu may have the amino acid sequence of SEQ ID NO: 16 or may be a variant or fragment thereof. L-Rhamnose Rhap may be transferred -1->2 linkage to QA-TriF to form 3-O- -D-xylopyranosyl-(1- >3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -L-rhamnopyranosyl-(1->2)- -D- fucopyranosyl ester}-quillaic acid (QA-TriFR) using a Saponaria officinalis QA-TriF rhamnosyl transferase SoQA-TriFRhaT SoC28Rha ). SoC28Rha may have the amino acid sequence of SEQ ID NO: 18 or may be a variant or fragment thereof. D- Xyl may be transferred via a 1->4 linkage to QA-TriFR to form 3-O- -D-xylopyranosyl-(1->3)- - D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->4)- -L- rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRX) using a Saponaria officinalis QA- TriFR xyl SoQA-TriFRXylT or SoC28Xyl1). SoC28Xyl1 may have the amino acid sequence of SEQ ID NO: 20 or may be a variant or fragment thereof. D- Xyl may be transferred via a 1->3 linkage to QA-TriFRX to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D- xylopyranosyl-(1->4)- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRXX).using a Saponaria officinalis QA-TriFRX xyl SoQA-TriFRXXylT or SoC28Xyl2). SoQA- TriFRXXylT may have the amino acid sequence of SEQ ID NO: 22 or may be a variant or fragment thereof. The quinovosyl group of QA-TriF(Q)RXX may be acetylated to form 3-O- -D-xylopyranosyl-(1->3)- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl- (1->4)- -L-rhamnopyranosyl-(1->2)- 2)- -D-4-O-acetylquinovopyranosyl- - -D-fucopyranosyl ester}- quillaic acid (SpB) using a Saponaria officinalis QA-TriF(Q)RXX acetyl BAHD1 polypeptide. SoBAHD1 may have the amino acid sequence of SEQ ID NO: 36 or may be a variant or fragment thereof. In preferred embodiments, the methods described herein will include the use of one or more of these newly characterised triterpenoid biosynthetic nucleic acids (e.g. one, two, three or more such nucleic acids) optionally in conjunction with the manipulation of other genes affecting QA or glycosylated QA biosynthesis known in the art. These newly characterised triterpenoid biosynthetic amino acid and nucleotide sequences from Saponaria officinalis (SEQ. ID: Nos 1-22, and 33-36) form aspects of the invention in their own right, as do variants of these sequences and methods of using them. Any one of these sequences or variants may be used to alter the QA or glycosylated QA content of a plant, as disclosed herein. For instance, a variant nucleic acid may include a sequence encoding a variant polypeptide sharing the relevant biological activity of the native polypeptide, as discussed above. Examples include variants of any of SEQ ID Nos 1 to 22 and 33-36. For brevity, in the context of the present invention, and in particular the methods and uses described herein, the polypeptide or nucleotide sequences of SEQ ID NOs: 1 to 22 and 33-36 and variants thereof described herein triterpenoid triterpenoid biosynthetic genes and triterpenoid biosynthetic polypeptides. Provided herein is a Saponaria officinalis -amyrin synthase (SobAS) polypeptide having the amino acid sequence of SEQ ID NO: 8 or a variant thereof, for example an amino acid sequence with at least 80% sequence identity to SEQ ID NO: 8. Also provided herein is a nucleic acid encoding said SobAS polypeptide having the nucleotide sequence of SEQ ID NO: 7 or a variant thereof, for example a nucleotide sequence with at least 80% sequence identity to SEQ ID NO: 7; and a vector comprising said nucleic acid. The SobAS polypeptide may be capable of cyclisation of the universal linear precursor 2,3-oxidosqualene (OS) to a triterpene. Also provided herein is a SoC28 oxidase polypeptide having the amino acid sequence of SEQ ID NO: 2 or a variant thereof, for example an amino acid sequence with at least 80% sequence identity to SEQ ID NO: 2. Also provided herein is a nucleic acid encoding said SoC28 oxidase polypeptide having the nucleotide sequence of SEQ ID NO: 1 or a variant thereof, for example a nucleotide sequence with at least 80% sequence identity to SEQ ID NO: 1; and a vector comprising said nucleic acid. The SoC28 oxidase polypeptide may be capable of oxidising -amyrin at the C28 position to a carboxylic acid forming oleanolic acid. Also provided herein is a SoC28C16 oxidase polypeptide having the amino acid sequence of SEQ ID NO: 4 or a variant thereof, for example an amino acid sequence with at least 50% sequence identity to SEQ ID NO: 4. Also provided herein is a nucleic acid encoding said SoC28C16 oxidase polypeptide having the nucleotide sequence of SEQ ID NO: 3 or a variant thereof, for example a nucleotide sequence with at least 50% sequence identity to SEQ ID NO: 3; and a vector comprising said nucleic acid. The SoC28C16 oxidase polypeptide may be capable of oxidising -amyrin, at the C16 position to an alcohol and at the C28 position to a carboxylic acid to form echinocystic acid. Also provided herein is a SoC23 oxidase polypeptide having the amino acid sequence of SEQ ID NO: 6 or a variant thereof, for example an amino acid sequence with at least 50% sequence identity to SEQ ID NO: 6. Also provided herein is a nucleic acid encoding said SoC23 oxidase polypeptide having the nucleotide sequence of SEQ ID NO: 5 or a variant thereof, for example a nucleotide sequence with at least 50% sequence identity to SEQ ID NO: 5; and a vector comprising said nucleic acid. The SoC23 oxidase may be capable of oxidising echinocystic acid at the C-23 position to an aldehyde forming QA Also provided herein is a Saponaria officinalis QA 3- SoCSL polypeptide having the amino acid sequence of SEQ ID NO: 10 or a variant thereof, for example an amino acid sequence with at least 60% sequence identity to SEQ ID NO: 10. Also provided herein is a nucleic acid encoding said QA 3- SoCSL polypeptide having the nucleotide sequence of SEQ ID NO: 9 or a variant thereof, for example a nucleotide sequence with at least 60% sequence identity to SEQ ID NO: 9; and a vector comprising said nucleic acid. The SoCSL may be capable of attaching D-glucuronic acid -O position of quillaic acid to form 3-O- -D-glucopyranosiduronic acid}-quillaic acid - - Also provided herein is a Saponaria officinalis QA-GlcA SoC3Gal polypeptide having the amino acid sequence of SEQ ID NO: 12 or a variant thereof, for example an amino acid sequence with at least 80% sequence identity to SEQ ID NO: 12. Also provided herein is a nucleic acid encoding said Saponaria officinalis QA-GlcA SoC3Gal polypeptide having the nucleotide sequence of SEQ ID NO: 11 or a variant thereof, for example a nucleotide sequence with at least 80% sequence identity to SEQ ID NO: 11; and a vector comprising said nucleic acid. The SoC3Gal may be capable of attaching D- -1->2 linkage to QA-GlcA to form 3-O- -D-galactopyranosyl- (1->2)]- -D-glucopyranosiduronic acid}-quillaic acid - -GlcA- Also provided herein is a Saponaria officinalis QA-GlcA-Gal x SoC3Xyl polypeptide having the amino acid sequence of SEQ ID NO: 14 or a variant thereof, for example an amino acid sequence with at least 80% sequence identity to SEQ ID NO: 14. Also provided herein is a nucleic acid encoding said Saponaria officinalis QA-GlcA-Gal x SoC3Xyl polypeptide having the nucleotide sequence of SEQ ID NO: 13 or a variant thereof, for example a nucleotide sequence with at least 80% sequence identity to SEQ ID NO: 13; and a vector comprising said nucleic acid. The SoC3Xyl may be capable of attaching D- -GlcA-Gal to form 1, 3-O- -D-xylopyranosyl-(1- >3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}- QA- -Tri Also provided herein is a Saponaria officinalis QA-Tri fucosyl transferase SoC28Fu polypeptide having the amino acid sequence of SEQ ID NO: 16 or a variant thereof, for example an amino acid sequence with at least 60% sequence identity to SEQ ID NO: 16. Also provided herein is a nucleic acid encoding said Saponaria officinalis QA-Tri fucosyl transferase SoC28Fu polypeptide having the nucleotide sequence of SEQ ID NO: 15 or a variant thereof, for example a nucleotide sequence with at least 60% sequence identity to SEQ ID NO: 15; and a vector comprising said nucleic acid. The SoC28Fu may be capable of attaching -O position of QA-Tri to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1- >2)]- -D-glucopyranosiduronic acid}-28-O- -D-fucopyranosyl ester}-quillaic acid (QA-TriF). Also provided herein is a Saponaria officinalis QA-TriF rhamnosyl transferase SoC28Rha polypeptide having the amino acid sequence of SEQ ID NO: 18 or a variant thereof, for example an amino acid sequence with at least 50% sequence identity to SEQ ID NO: 18. Also provided herein is a nucleic acid encoding said Saponaria officinalis QA-TriF rhamnosyl transferase SoC28Rha polypeptide having the nucleotide sequence of SEQ ID NO: 17 or a variant thereof, for example a nucleotide sequence with at least 50% sequence identity to SEQ ID NO: 17; and a vector comprising said nucleic acid. The SoC28Rha may be capable of -TriF to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -L-rhamnopyranosyl-(1->2)- -D- fucopyranosyl ester}-quillaic acid (QA-TriFR). Also provided herein is a Saponaria officinalis QA-TriFR xyl SoC28Xyl1 polypeptide having the amino acid sequence of SEQ ID NO: 20 or a variant thereof, for example an amino acid sequence with at least 50% sequence identity to SEQ ID NO: 20. Also provided herein is a nucleic acid encoding said Saponaria officinalis QA-TriFR xyl SoC28Xyl1 polypeptide having the nucleotide sequence of SEQ ID NO: 19 or a variant thereof, for example a nucleotide sequence with at least 50% sequence identity to SEQ ID NO: 19; and a vector comprising said nucleic acid. The SoC28Xyl1 may be capable of attaching D- -TriFR to form 3-O- -D-xylopyranosyl-(1->3)- - D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->4)- -L- rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRX). Also provided herein is a Saponaria officinalis QA-TriFRX xyl SoC28Xyl2 polypeptide having the amino acid sequence of SEQ ID NO: 22 or a variant thereof, for example an amino acid sequence with at least 50% sequence identity to SEQ ID NO: 22. Also provided herein is a nucleic acid encoding said Saponaria officinalis QA-TriFRX xyl SoC28Xyl2 polypeptide having the nucleotide sequence of SEQ ID NO: 21 or a variant thereof, for example a nucleotide sequence with at least 50% sequence identity to SEQ ID NO: 21; and a vector comprising said nucleic acid. The SoC28Xyl2may be capable of attaching D- QA-TriFRX to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D- xylopyranosyl-(1->4)- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRXX). Also provided herein is a Saponaria officinalis QA-TriFRXX quinovosyl transferase SoGH1 polypeptide having the amino acid sequence of SEQ ID NO: 34 or a variant thereof, for example an amino acid sequence with at least 50% sequence identity to SEQ ID NO: 34. Also provided herein is a nucleic acid encoding said Saponaria officinalis QA-TriFRXX quinovosyl transferase SoGH1 polypeptide having the nucleotide sequence of SEQ ID NO: 33 or a variant thereof, for example a nucleotide sequence with at least 50% sequence identity to SEQ ID NO: 33; and a vector comprising said nucleic acid. The SoGH1 may be capable of attaching D- QA-TriFRXX to form 3-O- -D-xylopyranosyl- - -D- galactopyranosyl- - -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl- - -D-xylopyranosyl- - -L-rhamnopyranosyl- - -D-quinovopyranosyl- - -D-fucopyranosyl ester}-quillaic acid (QA-TriF(Q)RXX). Also provided herein is a Saponaria officinalis QA-TriF(Q)RXX acetyl transferase SoBAHD1 polypeptide having the amino acid sequence of SEQ ID NO: 36 or a variant thereof, for example an amino acid sequence with at least 50% sequence identity to SEQ ID NO: 36. Also provided herein is a nucleic acid encoding said Saponaria officinalis QA-TriF(Q)RXX acetyl transferase SoBAHD1 polypeptide having the nucleotide sequence of SEQ ID NO: 35 or a variant thereof, for example a nucleotide sequence with at least 50% sequence identity to SEQ ID NO: 35; and a vector comprising said nucleic acid. The SoBAHD1 may be capable of acetylating QA-TriF(Q)RXX to form 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl-(1->4)- -L- rhamnopyranosyl-(1->2)- -D-4-O-acetylquinovopyranosyl- - -D-fucopyranosyl ester}-quillaic acid (SpB). An amino acid sequence described herein that is a variant of a reference sequence, such as a peptide, polypeptide or protein sequence described herein, for example any one of SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 34 or 36, may have 1 or more amino acid residues altered relative to the reference sequence. For example, 50 or fewer amino acid residues may be altered relative to the reference sequence, preferably 45 or fewer, 40 or fewer, 30 or fewer, 20 or fewer, 15 or fewer, 10 or fewer, 5 or fewer or 3 or fewer, 2 or 1. For example, a variant described herein may comprise the sequence of a reference sequence with 50 or fewer, 45 or fewer, 40 or fewer, 30 or fewer, 20 or fewer, 15 or fewer, 10 or fewer, 5 or fewer, 3 or fewer, 2 or 1 amino acid residues mutated. An amino acid residue in the reference sequence may be altered or mutated by insertion, deletion or substitution, preferably substitution for a different amino acid residue. Such alterations may be caused by one or more of addition, insertion, deletion or substitution of one or more nucleotides in the encoding nucleic acid. A nucleotide sequence described herein that is a variant of a reference sequence, such as a nucleotide sequence described herein, for example any one of SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19,21, 33 or 35, may have 1 or more nucleotides altered relative to the reference sequence. For example, 50 or fewer nucleotides may be altered relative to the reference sequence, preferably 45 or fewer, 40 or fewer, 30 or fewer, 20 or fewer, 15 or fewer, 10 or fewer, 5 or fewer or 3 or fewer, 2 or 1. For example, a variant described herein may comprise the sequence of a reference sequence with 50 or fewer, 45 or fewer, 40 or fewer, 30 or fewer, 20 or fewer, 15 or fewer, 10 or fewer, 5 or fewer, 3 or fewer, 2 or 1 nucleotides mutated. A peptide, polypeptide or protein as described herein or a nucleotide sequence as described herein that is a variant of a reference sequence, such as an amino acid or nucleotide sequence described above, may share at least 50% sequence identity with the reference sequence, at least 55%, at least 60%, at least 65%, at least 70%, at least about 80%, at least 90%, at least 95%, at least 98% or at least 99% sequence identity. For example, a variant of a protein described herein may comprise an amino acid sequence that has at least 50% sequence identity with the reference amino acid sequence, at least 55%, at least 60%, at least 65%, at least 70%, at least about 80%, at least 90%, at least 95%, at least 98% or at least 99% sequence identity with the reference amino acid sequence, for example one or more of SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20,22, 34 and 36. A variant of a nucleic acid described herein may comprise a nucleotide sequence that has at least 50% sequence identity with the reference amino acid sequence, at least 55%, at least 60%, at least 65%, at least 70%, at least about 80%, at least 90%, at least 95%, at least 98% or at least 99% sequence identity with the reference nucleotide sequence, for example one or more of SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 33 and 35. Variants of different variant triterpenoid biosynthetic sequences may share different levels of sequence identity to their respective reference sequences. Combinations of variant triterpenoid biosynthetic sequences with all levels of sequence identity disclosed above are encompassed by the invention. Sequence identity is commonly defined with reference to the algorithm GAP (Wisconsin GCG package, Accelerys Inc, San Diego USA). GAP uses the Needleman and Wunsch algorithm to align two complete sequences that maximizes the number of matches and minimizes the number of gaps. Generally, default parameters are used, with a gap creation penalty = 12 and gap extension penalty = 4. Use of GAP may be preferred but other algorithms may be used, e.g. BLAST (which uses the method of Altschul et al. (1990) J. Mol. Biol.215: 405-410), FASTA (which uses the method of Pearson and Lipman (1988) PNAS USA 85: 2444-2448), or the Smith-Waterman algorithm (Smith and Waterman (1981) J. Mol Biol.147: 195-197), or the TBLASTN program, of Altschul et al. (1990) supra, generally employing default parameters. In particular, the psi-Blast algorithm may be used (Nucl. Acids Res. (1997) 253389-3402). Sequence identity and similarity may also be determined using Genomequest
TM software (Gene-IT, Worcester MA USA). Sequence comparisons are preferably made over the full-length of the relevant sequence described herein. A variant polypeptide may share the relevant biological activity of the reference polypeptide. A variant nucleic acid may encode the relevant variant polypeptide. In this context, a polypeptide described herein is the ability to catalyse the respective reaction shown in Fig.15 and described above. The relevant biological activities may be assayed based on the reactions shown in Fig.15 in vitro. Alternatively, they can be assayed by activity in vivo as described in the Examples i.e. by introduction of a plurality of heterologous constructs to generate the respective product into a host, which can be assayed by LC-MS or the like. Preferred variants may be: (i) Naturally occurring nucleic acids such as alleles (which will include polymorphisms or mutations at one or more bases) or pseudoalleles (which may occur at closely linked loci to the biosynthetic genes described herein). Also included are paralogues, isogenes, or other homologous genes belonging to the same families as the biosynthetic genes described herein, for example sharing clades or sub-clades. Also included are orthologues or homologues from other plant species (i.e., plants other than S. officinalis) Homology may be at the nucleotide sequence and/or amino acid sequence level, as discussed below. (ii) Artificial nucleic acids, which can be prepared by the skilled person in the light of the present disclosure. Such derivatives may be prepared, for instance, by site directed or random mutagenesis, or by direct synthesis. Preferably the variant nucleic acid is generated either directly or indirectly (e.g. via one or more amplification or replication steps) from an original nucleic acid having all or part of the sequence of a biosynthetic gene described herein. Variants may also include nucleic acids corresponding to those above, but which have been extended at the 3' or 5' terminus. A method of producing a variant triterpenoid biosynthetic nucleic acid may comprise the step of modifying any of the genes described herein, for example one or more of SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19 21, 33 and 35. Changes may be desirable for a number of reasons. For instance, they may introduce or remove restriction endonuclease sites or alter codon usage. This may be particularly desirable where the genes are to be expressed in alternative hosts e.g. microbial hosts such as yeast. Methods of codon optimizing genes for this purpose are known in the art (see e.g. Elena, Claudia, et al. "Expression of codon optimized genes in microbial systems: current industrial applications and perspectives." Frontiers in microbiology 5 (2014)). Sequences described herein including codon modifications to maximise yeast expression represent embodiments of the invention. Alternatively, changes to a sequence may produce a derivative by way of one or more (e.g. several) of addition, insertion, deletion or substitution of one or more nucleotides in the nucleic acid, leading to the addition, insertion, deletion or substitution of one or more (e.g. several) amino acids in the encoded polypeptide. Such changes may modify sites which are required for post translation modification such as cleavage sites in the encoded polypeptide; motifs in the encoded polypeptide for phosphorylation etc. Leader or other targeting sequences (e.g. membrane or golgi locating sequences) may be added to the expressed protein to determine its location following expression if it is desired to isolate it from a microbial system. Other desirable mutations may be random or site-directed mutagenesis in order to alter the activity (e.g. specificity) or stability of the encoded polypeptide. Changes may be by way of conservative variation, i.e. substitution of one hydrophobic residue such as isoleucine, valine, leucine or methionine for another, or the substitution of one polar residue for another, such as arginine for lysine, glutamic for aspartic acid, or glutamine for asparagine. As is well known to those skilled in the art, altering the primary structure of a polypeptide by a conservative substitution may not significantly alter the activity of that peptide because the side-chain of the amino acid which is inserted into the sequence may be able to form similar bonds and contacts as the side chain of the amino acid which has been substituted out. This is so even when the substitution is in a region which is critical in determining the peptides conformation. Also included are variants having non-conservative substitutions. As is well known to those skilled in the art, substitutions to regions of a peptide which are not critical in determining its conformation may not greatly affect its activity because they do not greatly alter the peptide's three-dimensional structure. In regions which are critical in determining the peptides conformation or activity such changes may confer advantageous properties on the polypeptide. Indeed, changes such as those described above may confer slightly advantageous properties on the peptide e.g. altered stability or specificity. In some embodiments, a variant nucleotide sequence encoding a So polypeptide may be obtainable by means of a method which includes: (a) providing a preparation of nucleic acid, e.g. from plant cells. Test nucleic acid may be provided from a cell as genomic DNA, cDNA or RNA, or a mixture of any of these, preferably as a library in a suitable vector. If genomic DNA is used the probe may be used to identify untranscribed regions of the gene (e.g. promoters etc.), such as are described hereinafter, (b) providing a nucleic acid molecule which is a probe or primer as discussed above, (c) contacting nucleic acid in said preparation with said nucleic acid molecule under conditions for hybridisation of said nucleic acid molecule to any said gene or homologue in said preparation, and, (d) identifying said gene or homologue if present by its hybridisation with said nucleic acid molecule. Binding of a probe to target nucleic acid (e.g. DNA) may be measured using any of a variety of techniques at the disposal of those skilled in the art. For instance, probes may be radioactively, fluorescently, or enzymatically labelled. Other methods not employing labelling of probe include amplification using PCR (see below), RNase cleavage and allele specific oligonucleotide probing. The identification of successful hybridisation is followed by isolation of the nucleic acid which has hybridised, which may involve one or more steps of PCR or amplification of a vector in a suitable host. Preliminary experiments may be performed by hybridising under low stringency conditions. For probing, preferred conditions are those which are stringent enough for there to be a simple pattern with a small number of hybridisations identified as positive which can be investigated further. For example, hybridizations may be performed, according to the method of Sambrook et al. (below) using a - ented salmon sperm DNA, 0.05% sodium pyrophosphate and up to 50% formamide. Hybridization is carried out at 37-42
o C for at least six hours. Following hybridization, filters are washed as follows: (1) 5 minutes at room temperature in 2X SSC and 1% SDS; (2) 15 minutes at room temperature in 2X SSC and 0.1% SDS; (3) 30 minutes - 1 hour at 37
o C in 1X SSC and 1% SDS; (4) 2 hours at 42-65
o C in 1X SSC and 1% SDS, changing the solution every 30 minutes. One common formula for calculating the stringency conditions required to achieve hybridization between nucleic acid molecules of a specified sequence homology is (Sambrook et al., 1989): T
m = 81.5
o C + 16.6Log [Na+] + 0.41 (% G+C) - 0.63 (% formamide) - 600/#bp in duplex As an illustration of the above formula, using [Na+] = [0.368] and 50-% formamide, with GC content of 42% and an average probe size of 200 bases, the T
m is 57
o C. The T
m of a DNA duplex decreases by 1 - 1.5
o C with every 1% decrease in homology. Thus, targets with greater than about 75% sequence identity would be observed using a hybridization temperature of 42
o C. Such a sequence would be considered substantially homologous to the nucleic acid sequence of the present invention. It is well known in the art to increase stringency of hybridisation gradually until only a few positive clones remain. Other suitable conditions include, e.g. for detection of sequences that are about 80-90% identical, hybridization overnight at 42
o C in 0.25M Na2HPO4, pH 7.2, 6.5% SDS, 10% dextran sulfate and a final wash at 55
o C in 0.1X SSC, 0.1% SDS. For detection of sequences that are greater than about 90% identical, suitable conditions include hybridization overnight at 65
o C in 0.25M Na2HPO4, pH 7.2, 6.5% SDS, 10% dextran sulfate and a final wash at 60
o C in 0.1X SSC, 0.1% SDS. In a further embodiment, hybridization of a triterpenoid biosynthetic nucleic acid molecule to a variant may be determined or identified indirectly, e.g. using a nucleic acid amplification reaction, particularly the polymerase chain reaction (PCR). PCR requires the use of two primers to specifically amplify target nucleic acid, so preferably two nucleic acid molecules with sequences characteristic of a triterpenoid biosynthetic gene are employed. Using RACE PCR, only one such primer may be needed (see "PCR protocols; A Guide to Methods and Applications", Eds. Innis et al, Academic Press, New York, (1990)). Thus, a method involving use of PCR in obtaining a variant triterpenoid biosynthetic nucleic acid as described herein may include: (a) providing a preparation of plant nucleic acid, e.g. from a seed or other appropriate tissue or organ, (b) providing a pair of nucleic acid molecule primers useful in (i.e. suitable for) PCR, at least one of said primers being a primer directed to a triterpenoid biosynthetic sequence as discussed above, (c) contacting nucleic acid in said preparation with said primers under conditions for performance of PCR, (d) performing PCR and determining the presence or absence of an amplified PCR product. The presence of an amplified PCR product may indicate identification of a variant. In all cases above, if need be, clones or fragments identified in the search can be extended. For instance, if it is suspected that they are incomplete, the original DNA source (e.g. a clone library, mRNA preparation etc.) can be revisited to isolate missing portions e.g. using sequences, probes or primers based on that portion which has already been obtained to identify other clones containing overlapping sequence. The methods described herein may utilise fragments of the triterpenoid biosynthetic genes described herein, for example one or more of SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 33 and 35; or fragments of variants of these genes. Also provided is the production and use of fragments of the full-length polypeptides which is less than said full length polypeptide, but which retains its essential biological activity e.g. in relation to production of QA or the glycosylation of QA. A fragment of a full-length reference triterpenoid biosynthetic polypeptide sequence, such as SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 34 or 36, is a contiguous sequence of amino acids from the full-length protein sequence that consists of at least one fewer amino acid than the full-length protein sequence. For example, a fragment may lack a sequence of 10 or more, 20 or more, 50 or more of 100 or more amino acids relative to the full-length sequence. Preferably a fragment shares the relevant biological activity of the full-length reference polypeptide. In some embodiments, fragments of the polypeptides may include one or more epitopes useful for raising antibodies to a portion of any of the amino acid sequences disclosed herein. Preferred epitopes are those to which antibodies are able to bind specifically, which may be taken to be binding a polypeptide or fragment thereof with an affinity which is at least about 1000x that of other polypeptides. Purified protein (polypeptide, enzyme), or a fragment, mutant, derivative or variant thereof, e.g. produced recombinantly by expression from encoding triterpenoid biosynthetic nucleic acid therefor, forms an aspect of the invention. Such purified polypeptides may be used to raise antibodies employing techniques which are standard in the art. Antibodies and polypeptides comprising antigen-binding fragments of antibodies may be used in identifying homologues from other species as discussed further below. Methods of producing antibodies include immunising a mammal (e.g. human, mouse, rat, rabbit, horse, goat, sheep or monkey) with the protein or a fragment thereof. Antibodies may be obtained from immunised animals using any of a variety of techniques known in the art, and might be screened, preferably using binding of antibody to antigen of interest. For instance, Western blotting techniques or immunoprecipitation may be used (Armitage et al, 1992, Nature 357: 80-82). Antibodies may be polyclonal or monoclonal. As an alternative or supplement to immunising a mammal, antibodies with appropriate binding specificity may be obtained from a recombinantly produced library of expressed immunoglobulin variable domains, e.g. using lambda bacteriophage or filamentous bacteriophage which display functional immunoglobulin binding domains on their surfaces; for instance see WO92/01047. Antibodies raised to a polypeptide or peptide can be used in the identification and/or isolation of homologous polypeptides, and then the encoding genes. covering any specific binding substance having a binding domain with the required specificity. Thus, this term covers antibody fragments, derivatives, functional equivalents and homologues of antibodies, including any polypeptide comprising an immunoglobulin binding domain, whether natural or synthetic. Mevalonic acid (MVA) is an important intermediate in triterpenoid synthesis. Therefore, it may be desirable to express rate limiting MVA pathway genes into the host, to maximise yields of a triterpenoid, such as QA. HMG-CoA reductase (HMGR) is believed to be a rate-limiting enzyme in the MVA pathway. The use of a recombinant feedback-insensitive truncated form of HMGR (tHMGR) has been demonstrated to -amyrin) content upon transient expression in N. benthamiana [Reed, J., et al. Metab Eng, 2017.42: p.185-193]. In some embodiments, a heterologous HMGR (e.g. a feedback insensitive HMGR) may be used along with the triterpenoid biosynthetic genes described herein. Examples of HMGR encoding or polypeptide sequences include SEQ ID Nos 23-26, or variants or fragments of these. Variants may be homologues, alleles, or artificial derivatives etc. as discussed in relation to biosynthetic genes or polypeptides as described above. For example an HMGR native to the host being utilised may be preferred for example a yeast HMGR in a yeast host, and so on. HMGR genes are known in the art and may be selected, as appropriate in the light of the present disclosure. It has also been reported that squalene synthase (SQS) is a potential rate-limiting step [Reed et al supra]. In some embodiments, a heterologous SQS may be used along with the biosynthetic genes described herein and optionally HMGR described herein. Examples of SQS encoding or polypeptide sequences include SEQ ID Nos 27 and 28, or variants or fragments of these. Variants may be homologues, alleles, or artificial derivatives etc. as discussed in relation to biosynthetic genes or polypeptides as described above. For example an SQS native to the host being utilised may be preferred for example a yeast SQS in a yeast host, and so on. SQS genes are known in the art and may be selected, as appropriate in the light of the present disclosure. When using certain hosts (for example yeasts) it may be desirable to introduce additional genes to improve the flux of biosynthetic production. Examples may include one or more plant cytochrome P450 reductases (CPRs) to serve as the redox partner to the introduced P450s. In some embodiments, a heterologous cytochrome P450 reductase such as AtATR2 (Arabidopsis thaliana cytochrome P450 reductase 2) may be used along with the biosynthetic polypeptides and genes described herein. Examples of AtATR2 encoding or polypeptide sequences include SEQ ID Nos 29 and 30, or variants or fragments of these. Variants may be homologues, alleles, or artificial derivatives etc. as discussed in relation to biosynthetic polypeptides and genes as described above. In some embodiments, a heterologous nucleic acid described herein may further encode one or more of the following polypeptides: (i) an HMG-CoA reductase (HMGR) and/or (ii) a squalene synthase (SQS). HMGR or SQS may be optionally selected from the respective polypeptides in SEQ ID NOs 24, 26 and 28 or variants or fragments of any of said polypeptides or are encoded by the respective polynucleotides of SEQ ID NOs 23, 25 and 27, or variants or fragments of any of said polynucleotides. Nucleic acid may include cDNA, RNA, genomic DNA and modified nucleic acids or nucleic acid analogues (e.g. peptide nucleic acid). Where a DNA sequence is specified, e.g. with reference to a figure, unless context requires otherwise the RNA equivalent, with U substituted for T where it occurs, is encompassed. Nucleic acids may include more than one nucleic acid molecule. Nucleic acid molecules according to the present invention may be provided isolated and/or purified from their natural environment, in substantially pure or homogeneous form, or free or substantially free of other nucleic acids of the species of origin, and The nucleic acid molecules may be wholly or partially synthetic. In particular they may be recombinant in that nucleic acid sequences which are not found together in nature (do not run contiguously) have been ligated or otherwise combined artificially. Nucleic acids may comprise, consist, or consist essentially of, any of the sequences discussed hereinafter. The complement of a nucleic acid described herein means the complementary sequence of the or a nucleotide sequence comprised by the nucleic acid. Optionally, complementary sequences are full length compared to the reference nucleotide sequence. The term "heterologous" is used broadly herein to indicate that the gene/sequence of nucleotides in question (e.g. encoding biosynthesis modifying polypeptides) have been introduced into said cells of the host or an ancestor thereof, using genetic engineering, i.e. by human intervention. Nucleic acid heterologous to a host cell will be non-naturally occurring in cells of that type, variety or species. Thus the heterologous nucleic acid may comprise a coding sequence of or derived from a particular type of plant cell or species or variety of plant, placed within the context of a plant cell of a different type or species or variety of plant. A further possibility is for a nucleic acid sequence to be placed within a cell in which it or a homologue is found naturally, but wherein the nucleic acid sequence is linked and/or adjacent to nucleic acid which does not occur naturally within the cell, or cells of that type or species or variety of plant, such as operably linked to one or more regulatory sequences, such as a promoter sequence, for control of expression. he nucleotide sequences of the heterologous nucleic acid alter one the ability to biosynthesise a triterpenoid, such as QA or glycosylated QA e.g. QATri, QATriFRXX, QATriF(Q)RXX or SpB. Such transformation may be transient or stable. to, naturally produce detectable or recoverable levels of product under normal metabolic circumstances of that host. Following the application of the invention it is able to produce detectable or recoverable levels of product. The nucleotide sequence information provided herein may be used to design probes and primers for probing or amplification. An oligonucleotide for use in probing or PCR may be about 30 or fewer nucleotides in length (e.g.18, 21 or 24). Generally specific primers are upwards of 14 nucleotides in length. For optimum specificity and cost effectiveness, primers of 16-24 nucleotides in length may be preferred. Those skilled in the art are well versed in the design of primers for use in processes such as PCR. If required, probing can be done with entire restriction fragments of the gene disclosed herein which may be 100's or even 1000's of Probing may employ the standard Southern blotting technique. For instance, DNA may be extracted from cells and digested with different restriction enzymes. Restriction fragments may then be separated by electrophoresis on an agarose gel before denaturation and transfer to a nitrocellulose filter. Labelled probe may be hybridised to the single stranded DNA fragments on the filter and binding determined. DNA for probing may be prepared from RNA preparations from cells. Probing may optionally be done by means of so- 27-31, for a review). A method described herein may employ the co-infiltration of a plurality of Agrobacterium tumefaciens strains each carrying one or more of the triterpenoid biosynthetic genes discussed above for concerted expression thereof in a biosynthetic pathway discussed above. In some embodiments, at least 2 or 3 different Agrobacterium tumefaciens strains are co-infiltrated e.g. each carrying a triterpenoid biosynthetic nucleic acid. The genes may be present from transient expression vectors. Vectors (typically binary vectors) for use as described herein may typically comprise an expression cassette comprising: (i) a promoter, operably linked to (ii) an enhancer sequence derived from the RNA-2 genome segment of a bipartite RNA virus, in which a target initiation site in the RNA-2 genome segment has been mutated; (iii) a nucleic acid sequence encoding one or more biosynthetic genes as described above; (iv) a terminator sequence; and optionally ator sequence. Further examples of vectors and expression systems suitable for use as described herein are described below. A triterpenoid biosynthetic gene described above may be contained in or in the form of a recombinant and preferably replicable vector. A vector may include, inter alia, any plasmid, cosmid, phage or Agrobacterium binary vector in double or single stranded linear or circular form which may or may not be self-transmissible or mobilizable, and which can transform a prokaryotic or eukaryotic host either by integration into the cellular genome or exist extrachromosomally (e.g. autonomous replicating plasmid with an origin of replication). Suitable expression vectors may include binary vectors for transient expression mediated by Agrobacterium tumefaciens (see for example Bevan et al Nucl Acid Res 984 Nov 26; 12(22): 8711–872). system includes (a) border sequences which permit the transfer of a desired nucleotide sequence into a plant cell genome; (b) desired nucleotide sequence itself, which will generally comprise an expression cassette of (i) a plant active promoter, operably linked to (ii) the target sequence and\or enhancer as appropriate. The desired nucleotide sequence is situated between the border sequences and is capable of being inserted into a plant genome under appropriate conditions. The binary vector system will generally require other sequence (derived from A. tumefaciens) to effect the integration. Generally this may be achieved by use of so called "agro-infiltration" which uses Agrobacterium-mediated transient transformation. Briefly, this technique is based on the property of Agrobacterium tumefaciens to transfer a portion of its DNA ("T-DNA") into a host cell where it may become integrated into nuclear DNA. The T-DNA is defined by left and right border sequences which are around 21-23 nucleotides in length. The infiltration may be achieved e.g. by syringe (in leaves) or vacuum (whole plants). In the present invention the border sequences will generally be included around the desired nucleotide sequence (the T-DNA) with the one or more vectors being introduced into the plant material by agro-infiltration. Other s -Translatable' Cowpea Mosaic Virus ('CPMV-HT') system. Suitable vectors based on pEAQ-HT expression plasmids for use in the CPMV-HT system are well known in the art (see for example WO2009/087391; Sainsbury et al (2009) Plant Biotechnol J 7(7): 682-693) Generally speaking, those skilled in the art are well able to construct vectors and design protocols for recombinant gene expression (e.g. for expressing a heterologous nucleic acid within a host or one or more cells of a host). Suitable vectors can be chosen or constructed, containing appropriate regulatory sequences, including promoter sequences, terminator fragments, polyadenylation sequences, enhancer sequences, marker genes and other sequences as appropriate. For further details see, for example, Molecular Cloning: a Laboratory Manual: 2nd edition, Sambrook et al, 1989, Cold Spring Harbor Laboratory Press or Current Protocols in Molecular Biology, Second Edition, Ausubel et al. eds., John Wiley & Sons, 1992. Specifically included are shuttle vectors by which is meant a DNA vehicle capable, naturally or by design, of replication in two different host organisms, which may be selected from actinomycetes and related species, bacteria and eucaryotic (e.g. higher plant, mosses, yeast or fungal cells). A vector including nucleic acid described herein need not include a promoter or other regulatory sequence, particularly if the vector is to be used to introduce the nucleic acid into cells for recombination into the genome. Preferably the nucleic acid in the vector is under the control of, and operably linked to, an appropriate promoter or other regulatory elements for transcription in a host cell such as a microbial, e.g. yeast and bacterial, or plant cell. The vector may be a bi-functional expression vector which functions in multiple hosts. In the case of genomic DNA, this may contain its own promoter or other regulatory elements (optionally in combination with a heterologous enhancer, such as the 35S enhancer discussed in the Examples below). The advantage of using a native promoter is that this may avoid pleiotropic responses. In the case of cDNA this may be under the control of an appropriate promoter or other regulatory elements for expression in the host cell A promoter is a sequence of nucleotides from which transcription may be initiated of DNA operably linked downstream (i.e. in the 3' direction on the sense strand of double-stranded DNA). Operably linked means joined as part of the same nucleic acid molecule, suitably positioned and oriented for transcription to be initiated from the promoter. DNA operably linked to a promoter is "under transcriptional initiation regulation" of the promoter. Suitable promoters include inducible promoters. The term "inducible" as applied to a promoter is well understood by those skilled in the art. In essence, expression under the control of an inducible promoter is "switched on" or increased in response to an applied stimulus. The nature of the stimulus varies between promoters. Some inducible promoters cause little or undetectable levels of expression (or no expression) in the absence of the appropriate stimulus. Other inducible promoters cause detectable constitutive expression in the absence of the stimulus. Whatever the level of expression is in the absence of the stimulus, expression from any inducible promoter is increased in the presence of the correct stimulus. Thus nucleic acid described herein may be placed under the control of an externally inducible gene promoter to place expression (expressing the heterologous sequence) under the control of the user. An advantage of introduction of a heterologous gene into a plant cell, particularly when the cell is comprised in a plant, is the ability to place expression of the gene under the control of a promoter of choice, in order to be able to influence gene expression, and therefore QA or glycosylated QA biosynthesis, according to preference. Furthermore, mutants and derivatives of the wild-type gene, e.g. with higher or lower activity than wild-type, may be used in place of the endogenous gene. Also provided is a gene construct, preferably a replicable vector, comprising a promoter (optionally inducible) operably linked to a biosynthetic gene described herein or a variant thereof. Particularly of interest in the present context are nucleic acid constructs which operate as plant vectors. Specific procedures and vectors previously used with wide success upon plants are described by Guerineau and Mullineaux (1993) (Plant transformation and expression vectors. In: Plant Molecular Biology Labfax (Croy RRD ed.) Oxford, BIOS Scientific Publishers, pp 121-148). Suitable vectors may include plant viral- derived vectors (see e.g. EP-A-194809). Preferably the vectors which are for use in plants comprise border sequences which permit the transfer and integration of the expression cassette into the plant genome. Preferably the construct is a plant binary vector. Preferably the binary transformation vector is based on pPZP (Hajdukiewicz, et al.1994). Other example constructs include pBin19 (see Frisch, D. A., L. W. Harris- -409). Suitable promoters which operate in plants include the Cauliflower Mosaic Virus 35S (CaMV 35S). Other Press, Milton Keynes, UK. The promoter may be selected to include one or more sequence motifs or elements conferring developmental and/or tissue-specific regulatory control of expression. Inducible plant promoters include the ethanol induced promoter of Caddick et al (1998) Nature Biotechnology 16: 177-180. If desired, selectable genetic markers may be included in the construct, such as those that confer selectable phenotypes such as resistance to antibiotics or herbicides (e.g. kanamycin, hygromycin, phosphinotricin, chlorsulfuron, methotrexate, gentamycin, spectinomycin, imidazolinones and glyphosate). Positive selection system such as that described by Haldrup et al.1998 Plant molecular Biology 37, 287-296, may be used to make constructs that do not rely on antibiotics. As explained above, a preferred vector is a 'CPMV-HT' vector as described in WO2009/087391. The Examples below demonstrate the use of these pEAQ-HT expression plasmids. These vectors (typically binary vectors) for use in the present invention will typically comprise an expression cassette comprising: (i) a promoter, operably linked to (ii) an enhancer sequence derived from the RNA-2 genome segment of a bipartite RNA virus, in which a target initiation site in the RNA-2 genome segment has been mutated; (iii) a nucleic acid sequence as described above; (iv) a terminator sequence; and optionally Enhancer sequences (or enhancer elements) are sequences derived from (or sharing homology with) the RNA-2 genome segment of a bipartite RNA virus, such as a comovirus, in which a target initiation site has been mutated. Such sequences can enhance downstream expression of a heterologous ORF to which they are attached. When present in transcribed RNA, such sequences may also enhance translation of a heterologous ORF to which they are attached. A target initiation site is the initiation site (start codon) in a wild-type RNA-2 genome segment of a bipartite virus (e.g. a comovirus) from which the enhancer sequence in question is derived, which serves as the initiation site for the production (translation) of the longer of two carboxy coterminal proteins encoded by the wild-type RNA-2 genome segment. Typically, the RNA virus will be a comovirus as described above. Most preferred vectors are the pEAQ vectors of WO2009/087391 which permit direct cloning version by use enhancer of the invention, positioned on a T-DNA which also contains a suppressor of gene silencing and an NPTII cassettes. The presence of a suppressor of gene silencing in such gene expression systems is preferred but not essential. Suppressors of gene silencing are known in the art and described in WO/2007/135480. They include HcPro from Potato virus Y, He-Pro from TEV, P19 from TBSV, rgsCam, B2 protein from FHV, the small coat protein of CPMV, and coat protein from TCV. A preferred suppressor when producing stable transgenic plants is the P19 suppressor incorporating a R43W mutation. As described herein, a host may be converted from a phenotype whereby the host is unable to carry out an effective biosynthesis described herein to a phenotype whereby the host is able to carry out said biosynthesis, such that the product can be recovered therefrom or utilised in vivo to synthesize downstream products. Biosynthesis may include (i) the conversion of OS to QA or to an intermediate such as oleanolic acid or echinocystic acid, (ii) the conversion of QA to QA-Tri, or to an intermediate such as QA-Mono or QA-Di (iii) the conversion of QA-Tri to QA-TriFRXX, or to an intermediate such as QA-TriF, QA-TriFR or QA-TriFX (iv) the conversion of QA into SpB or an intermediate, such as QA-TriF(Q)RXX. Biosynthesis may also include (i) the conversion of OS into -amyrin (ii) the conversion of -amyrin to oleanolic acid (iii) the conversion of oleanolic acid to echinocystic acid (iv) the conversion of echinocystic acid to QA (v) the conversion of QA into 3-O- -D-glucopyranosiduronic acid]oxy}-quillaic acid QA-GlcA (vi) the conversion of into 3-O- -D-glucopyranosiduronic acid]oxy}-quillaic acid QA-GlcA into 3-O- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-Gal 3-O- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-Gal into 3-O- - D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA- GlcA-Gal-Xyl - conversion of QA-Tri into 3-O- -D-xylopyranosyl-(1->3)- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-fucopyranosyl ester}-quillaic acid(QA- TriF) (ix) the conversion of QA-TriF into 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D- glucopyranosiduronic acid}-28-O- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA- TriFR) (x) the conversion of QA-TriFR into 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D- glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->4)- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRX); (xi) the conversion of QA-TriFRX into 3-O- -D-xylopyranosyl-(1->3)- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl- (1->4)- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRXX); (xii) the conversion of QA-TriFRXX into 3-O- -D-xylopyranosyl- - -D-galactopyranosyl- - -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl- - -D-xylopyranosyl- - -L-rhamnopyranosyl- - -D- quinovopyranosyl- - -D-fucopyranosyl ester}-quillaic acid (QA-TriF(Q)RXX) and/or (xii) the conversion of QA-TriF(Q)RXX to 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl-(1->4)- -L-rhamnopyranosyl-(1->2)- 2)- -D-4-O- acetylquinovopyranosyl- - -D-fucopyranosyl ester}-quillaic acid (SpB). As explained above, triterpenoid biosynthetic genes described herein may also be engineered into plants. Suitable techniques are available in the art (see for example WO 2019/122259). 2,3-oxidosqualene is ubiquitous in higher plants due to its role in sterol biosynthesis, so biosynthesis as described herein has wide applicability in plant hosts. Suitable plant hosts include any plant that is amenable to transformation with Agrobacterium spp. As discussed herein, additional activities may be employed when practising the methods described herein in microorganisms. Examples of suitable hosts include plants such as Nicotiana benthamiana and microorganisms such as yeast. These are discussed in more detail below. The invention may comprise transforming the host with heterologous nucleic acid as described above by introducing the biosynthetic nucleic acid into the host cell via a vector and causing or allowing recombination between the vector and the host cell genome to introduce a nucleic acid according to the present invention into the genome. In another aspect of the invention, there is provided a host cell transformed with a heterologous nucleic acid which comprises a plurality of triterpenoid biosynthetic nucleotide sequences each of which encodes a polypeptide which in combination have a biosynthesis activity described herein, wherein expression of said nucleic acid imparts on the transformed host the ability to carry out the biosynthesis or improves said ability in the host. The invention further encompasses a host cell transformed with triterpenoid biosynthetic nucleic acid or a vector as described above (e.g. comprising the biosynthesis modifying nucleotide sequences) especially a plant or a microbial cell. In the transgenic host cell (i.e. transgenic for the nucleic acid in question) the transgene may be on an extra-genomic vector or incorporated, preferably stably, into the genome. There may be more than one heterologous nucleotide sequence per haploid genome. The methods and materials described herein can be used, inter alia, to generate stable crop-plants that accumulate the biosynthetic triterpenoid saponin or other product. Examples of plants include row crops such as sunflower, potato, canola, dry bean, field pea, flax, safflower, buckwheat, cotton, maize, soybeans, and sugar beets. Major crop-plants such as corn, wheat, oilseed rape and rice may also be preferred hosts. Plants which include a plant cell according to the invention are also provided. Also provided are methods comprising introduction of such a construct into a plant cell or a microbial (e.g. bacterial, yeast or fungal) cell and/or induction of expression of a construct within a plant cell, by application of a suitable stimulus e.g. an effective exogenous inducer. As an alternative to microorganisms, cell suspension cultures of engineered glycosylated QA -producing plant species, including also the moss Physcomitrella patens, may be cultured in fermentation tanks (see e.g. Grotewold et al. (Engineering Secondary Metabolites in Maize Cells by Ectopic Expression of Transcription Factors, Plant Cell, 10, 721-740, 1998). Also provided is a host cell containing a heterologous construct described above, especially a plant or a microbial cell. The discussion of host cells above in relation to reconstitution of QA or glycosylated QA biosynthesis in heterologous organisms applies mutatis mutandis here. Also provided is a method of transforming a plant cell involving introduction of a construct as described above into a plant cell and causing or allowing recombination between the vector and the plant cell genome to introduce a nucleic acid described herein into the genome. The invention further encompasses a host cell transformed with nucleic acid or a vector described herein (e.g. comprising the triterpenoid biosynthetic nucleotide sequence) especially a plant or a microbial cell. In the transgenic plant cell (i.e. transgenic for the nucleic acid in question) the transgene may be on an extra- genomic vector or incorporated, preferably stably, into the genome. There may be more than one heterologous nucleotide sequence per haploid genome. Yeast has seen extensive employment as a triterpene-producing host and is therefore potentially well adapted for QA and then glycosylated QA biosynthesis as described herein, for example the biosynthesis of triterpenoid saponins. In some preferred embodiments, the host is a yeast. For such hosts, it may be desirable to introduce additional genes to improve the flux of QA, and hence QA or glycosylated QA production as described above. Examples may include one or more plant cytochrome P450 reductases (CPRs) to serve as the redox partner to the introduced P450s, as well as an HMGR. It may likewise be desirable to introduce additional genes to contribute other elements of the QA or improve QA glycosylation pathways. These may include enzymes providing UDP-sugar donors and the like (see e.g. Ohashi T, Hasegawa Y, Misaki R, Fujiyama K transferases and their application to flavonoid (2016). Applied Microbiology and Biotechnology.100(2): 687-696.); Oka T, Jigami Y. (2006). Reconstruction of de novo pathway for synthesis of UDP-glucuronic acid and UDP- xylose from intrinsic UDP-glucose in Saccharomyces cerevisiae . FEBS J.273(12):2645-57). In the light of the present disclosure, those skilled in the art can provide such ancillary activities as required. Plants, which include a plant cell transformed as described above, are also provided. If desired, following transformation of a plant cell, a plant may be regenerated, e.g. from single cells, callus tissue or leaf discs, as is standard in the art. Almost any plant can be entirely regenerated from cells, tissues and organs of the plant. Available techniques are reviewed in Vasil et al., Cell Culture and Somatic Cell Genetics of Plants, Vol I, II and III, Laboratory Procedures and Their Applications, Academic Press, 1984, and Weissbach and Weissbach, Methods for Plant Molecular Biology, Academic Press, 1989. In addition to the regenerated plant, also provide are the following: a clone of such a plant, seed, selfed or hybrid progeny and descendants (e.g. F1 and F2 descendants). Also provided is a plant propagule from such plants, that is any part which may be used in reproduction or propagation, sexual or asexual, including cuttings, seed and so on. In all cases these plants or parts include the plant cell or heterologous biosynthesis modifying nucleic acid described above, for example as introduced into an ancestor plant. It also provides any part of these plants (e.g. leaf, stem, dried or ground product, edible portion etc.), which in all cases include the plant cell or heterologous triterpenoid biosynthetic DNA described above. The present invention also encompasses the expression product of any of the coding triterpenoid biosynthetic nucleic acid sequences disclosed and methods of making the expression product by expression from encoding nucleic acid therefore under suitable conditions, which may be in suitable host cells. As described below, plant backgrounds such as those above may be natural or transgenic e.g. for one or more other genes relating to biosynthesis of a triterpenoid, such as QA or glycosylated QA, or otherwise affecting that phenotype or trait. In modifying the host phenotypes, the triterpenoid biosynthetic nucleic acids described herein may be used in combination with any other gene, such as transgenes affecting the rate or yield of biosynthesis of a triterpenoid, such as QA or glycosylated QA, or its modification, or any other phenotypic trait or desirable property. By use of a combination of genes, plants or microorganisms (e.g. bacteria, yeasts or fungi) can be tailored to enhance production of desirable precursors or reduce undesirable metabolism. A triterpenoid biosynthetic sequence described herein may be used In vitro or in vivo to catalyse its respective biological activity. For example, a method of converting 2,3-oxidosqualene (OS) into -amyrin may comprise contacting OS with a Saponaria officinalis -amyrin synthase (SobAS) comprising the amino acid sequence of SEQ ID NO 8 or a variant thereof, such that said OS is converted into -amyrin. Also provided is the use of a Saponaria officinalis -amyrin synthase (SobAS) comprising the amino acid sequence of SEQ ID NO 8 or a variant thereof to convert OS into -amyrin. A method of -amyrin at the C28 position to a carboxylic acid may comprise contacting -amyrin with a SoC28 oxidase polypeptide comprising the amino acid sequence of SEQ ID NO 2 or a variant thereof, -amyrin is oxidised to a carboxylic acid to produce oleanolic acid. Also provided the use of a SoC28 oxidase polypeptide comprising the amino acid sequence of SEQ ID NO 2 or a variant thereof, to oxidise the C28 position of -amyrin to a carboxylic acid. A method of oxidising oleanolic acid at the C16 position to an alcohol to produce echinocystic acid may comprise contacting oleanolic acid with a SoC28C16 oxidase polypeptide comprising an amino acid sequence of SEQ ID NO: 4 or or a variant thereof, such that the C16 position of said oleanolic acid is oxidised to an alcohol, thereby producing echinocystic acid. Also provided is the use of a SoC28C16 oxidase polypeptide comprising an amino acid sequence of SEQ ID NO: 4 or or variant thereof to oxidise the C16 position of oleanolic acid to an alcohol to produce echinocystic acid. A method of oxidising -amyrin, at the C16 position to an alcohol and the C28 position to a carboxylic acid to produce echinocystic acid may comprise contacting -amyrin with a SoC28C16 oxidase polypeptide comprising an amino acid sequence of SEQ ID NO: 4 or or a variant thereof, such that the C16 position of said -amyrin is oxidised to an alcohol and the C28 position to a carboxylic acid, thereby producing echinocystic acid. Also provided is the use of a SoC28C16 oxidase polypeptide comprising an amino acid sequence of SEQ ID NO: 4 or or variant thereof to oxidise the C28 and C16 positions of -amyrin to produce echinocystic acid. A method of oxidising echinocystic acid at the C-23 position to an alcohol to produce quillaic acid (QA) may comprise contacting echinocystic acid with a SoC23 oxidase polypeptide comprising the amino acid sequence of SEQ ID NO: 6 or or a variant thereof, such that the C-23 position of said echinocystic acid is oxidised to an aldehyde, thereby producing quillaic acid (QA). Also provided is the use of a SoC23 oxidase polypeptide comprising the amino acid sequence of SEQ ID NO: 6 or or a variant thereof to oxidise the C23 position of -amyrin or an oxidised derivative thereof to an aldehyde produce quillaic acid (QA). A method of converting quillaic acid (QA) into 3-O- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA may comprise contacting QA with Saponaria officinalis QA 3- SoCSL polypeptide comprising the amino acid sequence of SEQ ID NO: 10 or a variant thereof, such that said QA is converted into QA-GlcA. Also provided is the use of a SoCSL polypeptide comprising the amino acid sequence of SEQ ID NO: 10 or a variant thereof to convert QA into QA-GlcA. A method of converting 3-O- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA into 3-O- -D- galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-Gal comprise; contacting QA-GlcA with a Saponaria officinalis QA-GlcA SoC3Gal polypeptide comprising the amino acid sequence of SEQ ID NO: 12 or a variant thereof, such that said QA-GlcA is converted into QA-GlcA-Gal. Also provided is the use of a Saponaria officinalis QA-GlcA galactosyl SoC3Gal polypeptide comprising the amino acid sequence of SEQ ID NO: 12 or a variant thereof, to convert QA-GlcA into QA-GlcA-Gal. A method of converting 3-O- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-Gal into 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA-[Gal]-Xyl - may comprise contacting QA-GlcA with a Saponaria officinalis QA-GlcA-Gal x SoC3Xyl polypeptide comprising the amino acid sequence of SEQ ID NO: 14 or a variant thereof, such that said QA-GlcA-Gal is converted into QA-Tri. Also provided is the use of a Saponaria officinalis QA-GlcA-Gal x SoC3Xyl polypeptide comprising the amino acid sequence of SEQ ID NO: 14 or a variant thereof, to convert QA-GlcA-Gal into QA-Tri. A method of converting QA-Tri into 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D- glucopyranosiduronic acid}-28-O- -D-fucopyranosyl ester}-quillaic acid(QA-TriF)) may comprise contacting QA-Tri with a Saponaria officinalis QA-Tri fucosyl transferase SoC28Fu polypeptide comprising the amino acid sequence of SEQ ID NO: 16 or a variant thereof, such that said QA-Tri is converted into QA-TriF. Also provided the use of a Saponaria officinalis QA-Tri fucosyl transferase SoC28Fu polypeptide comprising the amino acid sequence of SEQ ID NO: 16 or a variant thereof, to convert QA-Tri into QA-TriF. A method of converting QA-TriF into 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D- glucopyranosiduronic acid}-28-O- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA- TriFR) may comprise contacting QA-TriF with a Saponaria officinalis QA-TriF rhamnosyl transferase SoC28Rha polypeptide comprising the amino acid sequence of SEQ ID NO: 18 or a variant thereof, such that said QA-TriF is converted into QA-TriFR. Also provided is the use of a Saponaria officinalis QA-TriF rhamnosyl transferase SoC28Rha polypeptide polypeptide comprising the amino acid sequence of SEQ ID NO: 18 or a variant thereof to convert QA-TriF into QA-TriFR. A method of converting QA-TriFR into 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D- glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->4)- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRX) may comprise contacting QA-TriFR with a Saponaria officinalis QA-TriFR xyl SoC28Xyl1 polypeptide comprising the amino acid sequence of SEQ ID NO: 20 or a variant thereof, such that said QA-TriFR is converted into QA-TriFRX. Also provided is the use of a Saponaria officinalis QA-TriFR xyl SoC28Xyl1 polypeptide comprising the amino acid sequence of SEQ ID NO: 20 or a variant thereof to convert QA-TriFR into QA-TriFRX. A method of converting QA-TriFRX into 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D- glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl-(1->4)- -L-rhamnopyranosyl- (1->2)- -D-fucopyranosyl ester}-quillaic acid (QA-TriFRXX) may comprise contacting QA-TriFRX with a Saponaria officinalis QA-TriFRX xyl SoC28Xyl2 polypeptide comprising the amino acid sequence of SEQ ID NO: 22 or a variant thereof, such that said QA-TriFRX is converted into QA-TriFRXX. Also provided is the use of a Saponaria officinalis QA-TriFRX xyl SoC28Xyl2 comprising the amino acid sequence of SEQ ID NO: 22 or a variant thereof to convert QA-TriFRX into QA-TriFRXX. A method of converting QA-TriFRXX into 3-O- -D-xylopyranosyl- - -D-galactopyranosyl- - -D- glucopyranosiduronic acid}-28-O- -D-xylopyranosyl- - -D-xylopyranosyl- - -L-rhamnopyranosyl- - -D-quinovopyranosyl- - -D-fucopyranosyl ester}-quillaic acid (QA-TriF(Q)RXX) may comprise contacting QA-TriFRXX with a Saponaria officinalis QA-TriFRXX quinov SoGH1 polypeptide comprising the amino acid sequence of SEQ ID NO: 34 or a variant thereof, such that said QA- TriFRXX is converted into QA-TriF(Q)RXX. Also provided is the use of a Saponaria officinalis QA-TriFRXX quinovosyl SoC28Xyl2 comprising the amino acid sequence of SEQ ID NO: 34 or a variant thereof to convert QA-TriFRXX into QA-TriF(Q)RXX. A method of converting QA-TriF(Q)RXX into 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- - D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl-(1->4)- -L- rhamnopyranosyl-(1->2)- 2)- -D-4-O-acetylquinovopyranosyl- - -D-fucopyranosyl ester}-quillaic acid (QA-TriF(Q-Ac)RXX) may comprise contacting QA-TriF(Q)RXX with a Saponaria officinalis QA-TriF(Q)RXX acet SoBAHD1 polypeptide comprising the amino acid sequence of SEQ ID NO: 36 or a variant thereof, such that said QA-TriF(Q)RXX is converted into QA-TriF(Q-Ac)RXX. Also provided is the use of a Saponaria officinalis QA-TriF(Q)RXX acetyl SoBAHD1 comprising the amino acid sequence of SEQ ID NO: 36 or a variant thereof to convert QA-TriF(Q)RXX into QA-TriF(Q-Ac)RXX (SpB). In some embodiments, one or more of the nucleic acids or proteins described above may be used for the heterologous reconstitution of a biosynthetic pathway. Biosynthetic pathways are described above and may include one or more of the conversion of OS to QA, the conversion of QA to QA-Tri, the conversion of QA-Tri to QA-TriFRXX and the conversion of QA-TriFRXX into QA-TriF(Q-Ac)RXX. Also further provided is a method of influencing or affecting biosynthesis in a host, such as a plant, the method comprising causing or allowing transcription of a heterologous triterpenoid biosynthetic nucleic acid as discussed above within the cells of the plant. The step may be preceded by the earlier step of introduction of the nucleic acid into a cell of the plant or an ancestor thereof. Biosynthesis may include the production of QA; a glycosylated QA, such as QA-Tri, QA-TriFRXX or QA-TriF(Q-Ac)RXX; or an intermediate of any one of these. Such methods will usually form a part of, possibly one step in, a method of producing a glycosylated QA (e.g. QA-Tri, QA-TriFRXX or QA-TriF(Q-Ac)RXX) in a host such as a plant. Preferably, the method will employ a triterpenoid biosynthetic polypeptide or a variant thereof, as described above, or nucleic acid encoding either. The methods described above may be used to generate QA or a glycosylated QA, such as QA-Tri, QA- TriFRXX or QA-TriF(Q-Ac)RXX, in a heterologous host, or may be used to generate an intermediate. The glycosylated QA will generally be non-naturally occurring in the species into which they are introduced. Triterpenoids, including glycosylated forms of QA, such as QA-Tri, QA-TriFRXX or QA-TriF(Q-Ac)RXX, from the plants or methods described herein may be isolated and commercially exploited. The methods above may form a part of, possibly one step in, a method of producing downstream products, such as QS-21 in a host. The method may comprise the steps of culturing the host (where it is a microorganism) or growing the host (where it is a plant) and then harvesting it and purifying the triterpenoid, for example a glycosylated QA, such as QA-Tri, QA-TriFRXX, or QA-TriF(Q-Ac)RXX or a downstream product or derivative (e.g. QS-21) product therefrom. The product thus produced forms a further aspect of the present invention. The utility of QS-21 is described above. Alternatively, glycosylated QA, such as QA-Tri, QA-TriFRXX, QA-TriF(Q-Ac)RXX, may be recovered to allow for further chemical synthesis of downstream compounds. The methods described herein embrace both the in vitro and in vivo production, or manipulation, of triterpenoids, such as QA and/or one or more glycosylated QAs. For example, triterpenoid biosynthetic polypeptides may be employed in fermentation via expression in microorganisms such as e.g. E.coli, yeast and filamentous fungi and so on. In some embodiments, one or more newly characterised triterpenoid biosynthetic sequences described herein may be used in these organisms in conjunction with one or more other biosynthetic genes. In vivo methods are described extensively above, and generally involve the step of causing or allowing the transcription of, and then translation from, a recombinant nucleic acid molecule encoding the triterpenoid biosynthetic polypeptides. In other embodiments, the triterpenoid biosynthetic polypeptides (enzymes) may be used in vitro, for example in isolated, purified, or semi-purified form. Optionally they may be the product of expression of a recombinant nucleic acid molecule. Down-regulation of genes in a host may be desired e.g. to reduce undesirable metabolism or fluxes which might impact on yield of triterpenoids, such as QA or glycosylated QA. Such down regulation may be achieved by methods known in the art, for example using anti-sense technology. In using anti-sense genes or partial gene sequences to down-regulate gene expression, a nucleotide sequence is placed under the control of a promoter in a "reverse orientation" such that transcription yields RNA which is complementary to normal mRNA transcribed from the "sense" strand of the target gene. See, for example, Rothstein et al, 1987; Smith et al,(1988) Nature 334, 724-726; Zhang et al,(1992) The Plant Cell 4, 1575-1588, English et al., (1996) The Plant Cell 8, 179-188. Antisense technology is also reviewed in Bourque, (1995), Plant Science 105, 125-149, and Flavell, (1994) PNAS USA 91, 3490-3496. An alternative to anti-sense is to use a copy of all or part of the target gene inserted in sense, that is the same, orientation as the target gene, to achieve reduction in expression of the target gene by co- suppression. See, for example, van der Krol et al., (1990) The Plant Cell 2, 291-299; Napoli et al., (1990) The Plant Cell 2, 279-289; Zhang et al., (1992) The Plant Cell 4, 1575-1588, and US-A-5,231,020. Further refinements of the gene silencing or co-suppression technology may be found in WO95/34668 (Biosource); Angell & Baulcombe (1997) The EMBO Journal 16,12:3675-3684; and Voinnet & Baulcombe (1997) Nature 389: pg 553. Double stranded RNA (dsRNA) has been found to be even more effective in gene silencing than both sense or antisense strands alone (Fire A. et al Nature, Vol 391, (1998)). dsRNA mediated silencing is gene specific and is often termed RNA interference (RNAi) (See also Fire (1999) Trends Genet.15: 358-363, Sharp (2001) Genes Dev.15: 485-490, Hammond et al. (2001) Nature Rev. Genes 2: 1110-1119 and Tuschl (2001) Chem. Biochem.2: 239-245). RNA interference is a two-step process. First, dsRNA is cleaved within the cell to yield short interfering RNAs (siRNAs) of about 21-23nt length with 5' terminal phosphate and 3' short overhangs (~2nt). The siRNAs target the corresponding mRNA sequence specifically for destruction (Zamore P.D. Nature Structural Biology, 8, 9, 746-750, (2001) Another methodology known in the art for down-regulation of ta (miRNA) e.g. as described by Schwab et al 2006, Plant Cell 18, 1121-1133. This technology employs artificial miRNAs, which may be encoded by stem loop precursors incorporating suitable oligonucleotide sequences, which sequences can be generated using well defined rules in the light of the disclosure herein. In some embodiments, a method for influencing or affecting QA or glycosylated QA biosynthesis in a host, which method comprises any of the following steps of: (i) causing or allowing transcription from a nucleic acid comprising the complement sequence of a host nucleotide sequence described herein, such that respective encoded polypeptide activity is reduced by an antisense mechanism; (ii) causing or allowing transcription from a nucleic acid encoding a stem loop precursor comprising 20-25 nucleotides, optionally including one or more mismatches, of a host nucleotide sequence such that the respective encoded polypeptide activity is reduced by an miRNA mechanism; (iii) causing or allowing transcription from nucleic acid encoding double stranded RNA corresponding to 20-25 nucleotides, optionally including one or more mismatches, of a host nucleotide sequence such that the respective encoded polypeptide activity is reduced by an siRNA mechanism. It will be understood by those skilled in the art, in the light of the present disclosure, that additional genes may be utilised in the practice of the invention, to provide additional activities and\or improve expression or activity. These include those expressing co-factor or helper proteins, or other factors. It will be appreciated that where these generic terms are used in relation to any aspect or embodiment, the meaning or disclosure will be taken to apply mutatis mutandis to any of these sequences individually. Other aspects and embodiments of the invention provide the aspects and embodiments described above with the term cribed It is to be understood that the application discloses all combinations of any of the above aspects and embodiments described above with each other, unless the context demands otherwise. Similarly, the application discloses all combinations of the preferred and/or optional features either singly or together with any of the other aspects, unless the context demands otherwise. Modifications of the above embodiments, further embodiments and modifications thereof will be apparent to the skilled person on reading this disclosure, and as such, these are within the scope of the present invention. All documents and sequence database entries mentioned in this specification are incorporated herein by reference in their entirety for all purposes. components with or without the other. For example, en as specific disclosure of each of (i) A, (ii) B and (iii) A and B, just as if each is set out individually herein. Abbreviations QA-GlcA-[Gal]-Xyl or QA-Tri - 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D- glucopyranosiduronic acid}-quillaic acid QA-GlcA-Gal or QA-Di - 3-O- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-quillaic acid QA-GlcA or QA-Mono - 3-O- -D-glucopyranosiduronic acid}-quillaic acid QA-TriF - 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28- O- -
D -fucopyranosyl ester}-quillaic acid QA-TriFR - 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}-28- O- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid QA-TriFRX - 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}- 28-O- -D-xylopyranosyl-(1->4)- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid QA-TriFRXX - 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1->2)]- -D-glucopyranosiduronic acid}- 28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl-(1->4)- -L-rhamnopyranosyl-(1->2)- -D-fucopyranosyl ester}-quillaic acid QA-TriF(Q)RXX - 3-O- -D-xylopyranosyl- - -D-galactopyranosyl- - -D-glucopyranosiduronic acid}-28-O- -
D -xylopyranosyl- - -
D -xylopyranosyl- - -
L -rhamnopyranosyl- - -
D - quinovopyranosyl- - -D-fucopyranosyl ester}-quillaic acid QA-TriF(Q-Ac)RXX or SpB or Saponarioside B - 3-O- -D-xylopyranosyl-(1->3)- -D-galactopyranosyl-(1- >2)]- -D-glucopyranosiduronic acid}-28-O- -D-xylopyranosyl-(1->3)- -D-xylopyranosyl-(1->4)- -L- rhamnopyranosyl-(1->2)- 2)- -D-4-O-acetylquinovopyranosyl- - -D-fucopyranosyl ester}-quillaic acid QA Quillaic acid OS - 2,3-oxidosqualene Gal D-Galactopyranose GlcA D-Glucopyranuronic acid (Additional numbers denote specific carbons i.e. GlcA-1) Xyl
D -Xylopyranose Rha L-Rhamnopyranose Ac acetyl group Qui D-Quinovose (or Q) SobAS or SobAS1 – S officinalis -amyrin synthase SoC28 oxidase or SoC28 or CYP716A378 - S officinalis quillaic acid C28 oxidase SoC16 oxidase or SoC28C16 oxidase or SoC28C16 or CYP716A379 - S officinalis quillaic acid C28 and C16 oxidase SoC23 oxidase or SoC23 or CYP72A984 - S officinalis quillaic acid C23 oxidase SoQA-GlcAT or SoCSL or SoCSL1 S officinalis QA 3-O glucuronosyl transferase SoQA-GalT or SoC3Gal or UGT73DL1 S officinalis QA-GlcA galactosyl transferase SoQA-XylT or SoC3Xyl or UGT3CC6 S officinalis QA-GlcA-Gal Xylosyl transferase SoQA-TriFuT or SoC28Fu or UGT74CD1 - S officinalis QA-Tri fucosyl transferase SoFuSyn or SoSDR S officinalis short chain dehydrogenase SoQA-TriFRhaT or SoC28Rha or UGT79T1 - S officinalis QA-TriF rhamnosyl transferase SoQA-TriFRXylT or SoC28Xyl1 or UGT79L3 - S officinalis QA-TriFR xylosyl transferase SoQA-TriFRXXylT or SoC28Xyl2 or UGT73M2 - S officinalis QA-TriFRX xylosyl transferase SoGH1 - S officinalis QA-TriFRXX quinovosyl transferase SoBAHD1 S. officinalis QA-TriF(Q)RXX acetyl transferase tHMGR Avena strigosa (diploid oat) truncated 3-hydroxy, 3-methylbutyryl-CoA reductase Materials and Methods RNA synthesis and RNA-seq analysis Total RNA was extracted from leaf and root of a representative soapwort plant using RNeasy Plant Mini kit (Qiagen) with a modified protocol described in [MacKenzie et al (1997) Plant Disease.81: 222-226]. Along with RNA extraction, on-column DNase digestion was performed using RQ1 RNase-Free DNase (Promega). RNA using GoScriptTM Reverse Transcriptase (Prom Total of 24 RNA samples were sent to the Earlham Institute (EI) for transcriptome sequencing and RNA-seq analysis. NEBNext Ultra II Directional RNA-Seq library was constructed from 24 samples and were sequenced on two lanes of NovaSeq 6000 SP flow cell (150 pair-end reads). Transcriptome assembly was performed by EI using Trinity de novo assembler (ver.2.8.5), and ORF prediction and functional annotation was assigned using TransDecoder (ver.5.5.0) and Human Readable Descriptions (AHRD, ver.3.3.3), respectively. Transcript quantification was also provided by EI using salmon (ver.0.14.1). Identification of candidate genes To identify candidate bAS in soapwort, the S. officinalis transcriptome was obtained from the 1,000 Plants (1KP) project (www.onekp.com) [Wicket et al (2014) PNAS 45 E4859-4868]]. A BLASTP search was performed against a translated S. officinalis protein database using previously characterized OSCs from other plant species listed in Table 1 as queries. The list of soapwort candidates was filtered by removing sequences with a length less than 500 amino acids (aa). The list was further filtered by performing phylogenetic analysis in MEGA-X (http://www.megasoftware.net). An amino acid alignment was made from putative soapwort genes and published OSCs from other plants listed in Table 1 using the MUSCLE algorithm (https://www.ebi.ac.uk/Tools/msa/muscle/). The alignment was used to create a phylogenetic tree using the neighbour-joining algorithm (Poisson model) with 1,000 bootstrap replicates. Based on the phylogenetic analysis, candidates that are unlikely to be bAS were removed from the list. After identifying SobAS, all other pathway candidates were identified using the newly assembled S. officinalis transcriptome produced by EI. Preliminary lists of candidate soapwort CYP450s, CSLs and UGTs were each created by performing BLASTP search using literature gene families as queries against the new soapwort transcriptome. The lists were filtered by removing candidates that were less than 500 aa in length. To further refine the lists, correlation analysis was performed to find candidates with similar expression pattern of SobAS. All bioinformatic analyses was performed in R. The transcript quantification results from salmon were read in using tximport (ver.1.18.0). DEseq2 (ver.1.30.1) was used to generate rlog library-normalized method. cDNA synthesis and Gateway® cloning For cloning of candidate soapwort genes, cDNA pool was generated from leaf and root RNA. First-strand cDNA synthesis was performed using GoScript Reverse transcription system (Promega) following The coding sequences of candidate soapwort genes except SoGH1 were PCR amplified from the cDNA pool using gene specific primers with 5 AttB sites. The coding sequence of SoGH1 was synthesized by IDT. The PCR product was purified using QIAquick PCR Purification kit following ® technology (Invitrogen) was used to transfer the purified PCR product into the entry vector and eventually into the expression vector. Briefly, BP recombination reaction was performed following the purified PCR product and were subsequently heat-shock transformed into chemically competent Escherichia coli cells , ThermoFisher Scientific). Plasmids were recovered by performing plasmid preparations using QIAprep Spin Miniprep Kit (Qiagen) and were sequence verified. To generate expression 50 ng each) of the entry vector carrying the gene of interest and pEAQ-HT-DEST1 expression vector [Sainsbury et al (2009) Plant Biotechnol J 7(7): 682-693]. Plasmids were recovered again using QIAprep Spin Miniprep tocol. Transient expression of candidate genes in N. benthamiana Agrobacteria tumefaciens strain LBA4404 (Invitrogen) was used for transient expression of candidate genes in Nicotiana benthamiana. Agroinfiltration, sample harvest and preparation were performed as previously described in [Reed et al (2017) Metabolic Engineering, 42, 185-193]. GC-MS analysis The GC-MS analysis was performed using an Agilent 7890B fitted with a Zebron AB5-HT Inferno Column (Phenomenex) using a 20-minute method program developed by James Reed (Osbourn laboratory). Briefly, oven temperature was held for 2 min at 170 °C, then ramped to 300 °C at rate of 20 °C/min and held at 300 °C for 11.5 mins for a total run time of 20 minutes. The mass spectrometry was performed using an Agilent 5977A Mass Selector Detector in scan mode from 60-800 m/z after a solvent delay of 8 mins. MassHunter workstation (Agilent) was used to analyse the resulting data. The LC-MS analysis was performed using a Shimadzu Prominence HPLC system fitted with IT-TOF mass spectrometer (Shimadzu) using aqueous formic acid (0.1% v/v) as solvent A and acetonitrile as solvent B. The samples were analysed using a Kinetex XB- quipped with an electrospray in negative ionization mode (capillary temperature 250 °C, nebulizing gas 1.3 min/L, heat block temperature 300 °C, spray voltage -3.5 kV). The elution profile was as the following: 0 - 1 min, 5% B in A; 1 - 10 min, 55% B in A; 10 - 12 min, 100% B; 12 - 13 min, 100% B; 13 13.1 min, 5% B in A; 13.1 15.6 min, 5% B in A. MS/MS was used to monitor the daughter ion formation. LCMSolution software (Shimadzu) was used for data acquisition and processing. All authentic saponarioside pathway intermediate standards were provided by members of the Osbourn group. S. officinalis hairy root generation and transformation Seeds of S. officinalis were collected from the plants growing in JIC glasshouse. After washing with sterile water, seeds were kept in sterile water for 3-4 h and surface sterilized in sodium hypochlorite (5 % w/v) for 30 min, followed by three times washing with sterile water. Further, seeds were washed for 1 min in 70% ethanol (v/v), followed by three times washing with sterile water. The seeds were germinated on MS (Murashige and Skoog 1962) medium (pH 5.88), with 3 % sucrose and 0.8% Agar. Sub-culturing of plantlets was done after 4 weeks and was maintained in MS medium (pH 5.88), with 3 % sucrose and 0.8% Agar at 25°C with 16 h light photoperiod. The hairy roots induction was performed with ATCC15834, which was found efficient (100% induction) among other tested strains (A4, A4RS, and LBA1334). Briefly, leaf explants were injected with respective bacterial solutions (100uM Acetosyringone in MS, 1% sucrose, OD: 0.6) using needle with ~5 injection per leaf explants. The infected explants were kept for 4 days in co-cultivation media comprised of semi-solid (0.8% Agar) MS medium supplemented with 3% sucrose and 100uM acetosyringone in the dark for co-incubation at 25°C. Further, the explants were transferred to semi-solid (0.8% Agar) MS medium supplemented with 3% sucrose, 500mg/l cefotaxime, and 50mg/l Kanamycin for subsequent duration at 25°C, 16 h photoperiod, till removal of the bacteria and appearance of desired hairy roots. Primers for silencing were designed from unique regions of H( ]UUXRX\PZXa r-amyrin synthase (H]r6H) and cloned in pDONR207(Gateway-compatible vector). The subcloning was done in pK7WGIGW-2R, which offer dsRNA-mediated transgene silencing. For overexpression of H]r6H& the full-length sequence was cloned in pK7WG2R using gateway technology. The control hairy roots were raised using empty pK7WG2R (Zhao et al., 2016). All the constructs were transformed in ATCC15834 and after co-cultivation with wounded leaves, transgenic nature of hairy roots was assessed by dsRED fluorescence and PCR. Three weeks old dsRED expressing hairy roots grown on liquid B5 (with vitamins and sucrose) medium in dark were assessed for metabolite analysis. Results Identification and characterization of SobAS based on phylogeny The first committed step of triterpene biosynthesis is predicted to be the production of -amyrin catalysed by an oxidosqualene cyclase (OSC), -amyrin synthase (bAS). To identify candidate bASs in soapwort, we mined the translated S. officinalis transcriptome available from the 1,000 Plants (1KP) project (www.onekp.com; [Wickett et al supra]) and performed reciprocal BLASTP search using previously characterized OSCs from other plant species as search queries (Table 1). After phylogenetic analysis, SobAS was identified as a likely soapwort bAS candidate. To test the activity of SobAS, we transiently expressed SobAS in Nicotiana benthamiana with the truncated HMG-CoA reductase (tHMGR) to increase the flux towards the MVA pathway [Reed et al 2017 supra]. The full open reading frames of SobAS and tHMGR were transformed into Agrobacterium tumefaciens and were co-infiltrated into leaves of N. benthamiana. The infiltrated leaves were harvested after 4 days post- infiltration, and the metabolites were extracted and analyzed using GC-MS. The transient expression of SobAS in N. benthamiana led to the formation of peak 1 with m/z 498, which corresponded to the commercial -amyrin standard in both retention time and mass spectra (Figure 4a). Peak 1 was not present in the leaves only expressing tHMGR which served as a negative control (Figure 4a). Based on these results, the candidate SobAS is identified as an OSC capable of cyclizing oxidosqualene into -amyrin (Figure 4b). Identification of saponarioside pathway genes by co-expression analysis As the publicly available soapwort transcriptome from the 1KP project lacks any organ specific transcriptome data, we performed RNA-seq analysis on six different soapwort organs (flower, flower bud, young leaf, old leaf, stem, root) differing in saponin content. The new soapwort transcriptome was used for further gene identification instead of the transcriptome available from the 1KP project. Following the biosynthesis of -amyrin, the next predicted step in saponarioside biosynthesis is the oxidation of -amyrin to quillaic acid by three cytochrome P450s (CYP450s). To create a list of candidate soapwort CYP450s, BLASTP search was performed against the newly assembled soapwort transcriptome using literature CYP450s from the TriForC database (http://bioinformatics.psb.ugent.be/triforc/, [Miettinen et al (2017) Nature Comms 8(1) 1-13]) as queries. This list was refined by removing any candidates less than 500 aa in length. To further refine the candidate list, Pearson co-expression analysis was performed with the (PCC) less than 0.80 was filtered out from the candidate list (Table 3). The next step in the saponarioside biosynthetic pathway is predicted to be the decoration of the quillaic acid by Family 1 UDP-dependent glycosyltransferases (UGTs). To identify candidate UGTs in soapwort, A list of previously characterized UGTs from other plant species was obtained from [Louveau et al (2019) Cold Spring Harbor Perspectives in Biology, 11(12), a034744] and was used as a BLASTP query against the S. officinalis transcriptome. The list of candidates was further refined similarly as above. Pearson co-expression analysis was performed using the expression profile of SobAS, and any candidates with PCC value less than 0.90 were filtered out of the list (Table 4). In addition to the UGTs, recent findings by Jozwiak and co-workers and the members of the Osbourn group have illustrated the ability of cellulose synthase like (CSL) genes to glucuronidate triterpene saponins (Jozwiak et al., 2020; WO/2020/260475). As such, we also searched for candidate CSLs in the soapwort transcriptome. A list of literature CSLs from other plant species was obtained from Reed et al., (In preparation) and used a BLASTP query against the soapwort transcriptome. The list of candidate soapwort CSLs was further refined by performing Pearson co-expression analysis using the expression profile of SobAS. Any candidate soapwort CSLs with PCC values less than 0.85 were filtered from the list (Table 5). The identified putative saponarioside biosynthetic genes all shared a similar expression profile along the different soapwort organs, suggesting their involvement in the same biosynthetic pathway (Figure 3). The list of candidates was further selected and refined based on high co-expression (PCC > 0.88) with SobAS1 bait gene ranked using PCC, annotation and high absolute transcript count in the flower organ (Figure 3). Characterization of candidate genes by transient expression in N. benthamiana Candidate saponarioside biosynthetic genes identified above were transiently expressed in N. benthamiana to test their activity. The open reading frames (ORFs) of candidate genes were either PCR amplified using primers listed in Table 2 or synthesized with upstream sites to allow for Gateway® cloning. The amplified or synthesized gene fragments were cloned into pDONR207 and were transferred into the plant expression vector pEAQ-HT-DEST1 [Sainsbury et al 2009 supra]. The expression constructs were individually transformed into Agrobacterium tumefaciens (LBA4404) for transient expression in N. benthamiana. In all experiments, A. tumefaciens strain carrying tHMGR was co-infiltrated to enhance the triterpene production in N. benthamiana. By screening the activity of top candidates in Tables 3-5 and Figure 3, SoC28, SoC28C16, SoC23, SoCSL, SoC3Gal, SoC3Xyl, SoC28Fu, SoC28Rha, SoC28Xyl1, SoC28Xyl2, SoGH1 and SoBAHD1 were identified. N. benthamiana leaves were co-infiltrated with A. tumefaciens strains each carrying ORFs of (i) tHMGR + SobAS + SoC28 or (ii) tHMGR + SobAS + SoC28C16 to test the activity of SoC28 and SoC28C16. The leaves were harvested 4 days after infiltration and the metabolites were extracted and analyzed using GC- MS. The co-expression of SobAS with SoC28 in N. benthamiana led to the formation of a peak 2 with m/z 585 (Figure 5b). The retention time (RT), m/z and mass spectra of peak 2 present in the N. benthamiana extract corresponded with peak 2 found in the commercial oleanolic acid standard; therefore, peak 2 was identified as oleanolic acid (Figure 5b). Interestingly, extracts from N. benthamiana leaves co-infiltrated with SobAS and SoC28C16 also produced oleanolic acid and an addition metabolite peak with m/z 570 (peak 3). The RT and mass spectra of peak 3 found in the N. benthamiana extract corresponded with peak 3 found in the echinocystic acid standard, thus peak 3 was identified as echinocystic acid. Both peaks 2 and 3 were not detected in the N. benthamiana leaves only expressing tHMGR used as a negative control. Based on these results, SoC28 is likely to be a CYP450 with a C28 oxidation activity, leading to the formation of oleanolic acid from -amyrin, while SoC28C16 is likely to be a CYP450 with both C28 and C16 oxidation activity, leading to the production of both oleanolic acid and echinocystic acid. Activity of SoC23 was tested by co-infiltrating N. benthamiana leaves with A. tumefaciens strains each carrying the OFRs of tHMGR + SobAS + SoC28C16 + SoC23. The extracts of the harvested leaves were analyzed using HPLC-MS in negative ionization mode. The expression of SoC23 lead to the production of peak 4 with m/z 485.3 which corresponds to [M-H]- of quillaic acid (Figure 5b). Furthermore, the retention time and the mass spectra of peak 4 matched with the peak observed in the quillaic acid standard. Peak 4 was not detected in the negative control, where N. benthamiana leaves were only expressing tHMGR. Based on this result, SoC23 is likely to be a CYP450 with C-23 oxidation activity. Following the biosynthesis of quillaic acid using genes from S. officinalis, candidate SoCSL was co- expressed with genes required to produce quillaic acid (tHMGR + SobAS + SoC28C16 + SoC23). The extracts of the harvested leaves were analyzed using HPLC-MS in negative ionization mode. The HPLC-MS analysis revealed the production of peak 5 with m/z 661.3, the expected [M-H]- of QA-Mono, by the addition of SoCSL (Figure 6). Peak 5 is not detected in the negative control only expressing tHMGR, and the RT and mass spectra of peak 5 matches with the QA-Mono authentic standard from the Osbourn group (Figures 6). The MS/MS fragmentation pattern of peak 5 also shows the main fragment ion to be m/z 485.33, which corresponds to the expected [M-H]- of quillaic acid. Based on the above results, peak 5 was identified as QA- Mono and SoCSL is a CSL able to glucuronidate quillaic acid. Next, the candidate SoC3Gal was co-expressed with genes required to produce quillaic acid (tHMGR + SobAS + SoC28C16 + SoC23) and the newly characterized SoCSL. As similarly above, the harvested leaf extracts were analyzed using HPLC-MS. As a negative control, plant extracts expressing only genes producing quillaic acid and SoCSL was used. The addition of SoC3Gal resulted in the production of a new peak with m/z 823.4, which corresponds to the [M-H]- of QA-Di (Figure 7). Furthermore, the RT and mass- spectra of peak 6 produced by SoC3Gal matched with the peak produced by the authentic QA-Di standard. Additionally, the MS/MS fragmentation pattern revealed the major fragment ion of peak 6 to be m/z 485.32, corresponding to the [M-H]- of quillaic acid, which suggests the fragmentation of the sugar chain from QA-Di (Figure 7). Based on these results, peak 6 in Figure 7 was putatively identified as QA-Di, and SoC3Gal to be a galactose-transferase from S. officinalis. The candidate SoC3Xyl was characterized next. The genes required to produce QA-Di (tHMGR + SobAS + SoC28C16 + SoC23 + SoCSL + SoC3Gal) was co-expressed with the addition of SoC3Xyl in N. benthamiana. The harvested leaf extracts were analyzed using HPLC-MS for a new gene product with expected mass of m/z 955.4, corresponding to [M-H]- of QA-Tri. While the negative control only co- expressing genes required to produce up to QA-Di did not produce any peak at the expected m/z, a new peak with m/z 955.4 was observed with the additional expression of SoC3Xyl (Figure 8). Not only did peak 7 have the same RT and mass-spectra as the authentic QA-Tri standard, MS/MS fragmentation also revealed the major ions to be m/z 823.42 [M - H - Xyl]- and m/z 485.33 [M - H - Xyl - Gal]- (Figure 8). Based on the results, peak 7 in Figure 8 was putatively identified as QA-Tri and SoC3Xyl candidate to be xylose- transferase. Our next focus was to characterize a sugar-transferase with the activity to transfer D-fucose to QA-Tri. Previous research by the Osbourn group has identified two genes, QsC28Fu and QsFuSyn, involved in the addition of D-fucose in QS-21 biosynthetic pathway. QsC28Fu was revealed to have UDP-4-keto-6-deoxy- glucose-transferase activity, while QsFuSyn was a 4-keto-reductase (Reed, Orme, El-Demerdash et al., 2023). In the process of this discovery, SoFuSyn was also identified and characterized to convert UPD-4- keto-6-deoxy-glucose to UPD-D-fucose. The SoC28Fu candidate gene was identified through co-expression analysis with SobAS, and we tested the activity of the candidate gene by transient expression in N. benthamiana. The combination of genes required to produce QA-Tri (tHMGR + SobAS + SoC28C16 + SoC23 + SoCSL + SoC3Gal + SoC3Xyl) with the addition of candidate SoC28Fu and previously characterized SoFuSyn was co-expressed in N. benthamiana. Following harvest, the leaves were extracted and analyzed on HPLC-MS. Peak 8 (m/z 1101.5) was produced by the additional activity of SoC28Fu, which corresponded in RT and mass-spectra as the peak produced by the authentic QA-TriF standard (Figure 9). The peak was not detected in the negative control without SoFuC28 (Figure 9). Furthermore, the MS/MS fragmentation pattern of peak 8 revealed the major daughter ions to be m/z 955.4, expected [M-H]- of QA- Tri, and m/z 485.3, [M-H]- of quillaic acid (Figure 9). These results suggest that the candidate SoC28Fu, together with SoFuSyn, may transfer a fucose moiety to QA-Tri. Next, the activity of candidate SoC28Rha was tested. The combination of genes required to produce QA-TriF (tHMGR + SobAS + SoC28C16 + SoC23 + SoCSL + SoC3Gal + SoC3Xyl + SoFuSyn + SoC28Fu) with the addition of SoC28Rha was co-expressed in N. benthamiana. The harvested leaf extracts were analyzed using HPLC-MS in negative ionization mode. The leaf extracts only expressing genes required to produce QA-TriF was used as a negative control. Peak 9 with the expected [M-H]- of QA-TriFR, m/z 1247.5, was only detected in leaf extracts additionally expressing SoC28Rha (Figure 10). Furthermore, the MS/MS fragmentation of peak 9 shows the major fragment ions to be m/z 955.4, corresponding to [M-H]- of QA-Tri, and m/z 485.3, corresponding to [M-H]- of quillaic acid, suggesting the fragment of the C28 sugar chain, followed by the C-3 sugar chain (Figure 10). Based on these results, we putatively identified peak 9 in Figure 10 to be QA-TriFR, and SoC28Rha to be a rhamnose-transferase. The next two enzymes that were characterized are SoC28Xyl1 and SoC28Xyl2. To test SoC28Xyl1, genes required to produce QA-TriFR (tHMGR + SobAS + SoC28C16 + SoC23 + SoCSL + SoC3Gal + SoC3Xyl + SoFuSyn + SoC28Fu + SoC28Rha) were co-expressed with the candidate SoC28Xyl1 in N. benthamiana. Extracts from leaf only expressing genes required to produce QA-TriFR were used as a negative control. Peak 10 with m/z 1379.6, the expected [M-H]- of QA-TriFR, was only detected in samples expressing SoC28Rha with genes required to produce the substrate, QA-TriFR (Figure 11). The MS/MS fragmentation reveals the major fragment ions of peak 10 to be m/z 955.4 and m/z 485.3, which suggests the loss of the C28 sugar chain to yield QA-Tri, followed by the loss of the C3 sugar chain, yielding quillaic acid (Figure 11). The activity of SoC28Xyl2 was determined similarly to SoC28Xyl1. The genes required to produce QA- TriFRX (tHMGR + SobAS + SoC28C16 + SoC23 + SoCSL + SoC3Gal + SoC3Xyl + SoFuSyn + SoC28Fu + SoC28Rha + SoC28Xyl1) was co-expressed with the candidate SoC28Xyl2 in N. benthamiana. Tobacco leaves expressing genes required to produce QA-TriFRX without the addition of SoC28Xyl2 candidate was used as a negative control. The HPLC-MS analysis revealed that the production of peak 11 with the expected [M-H]- of m/z 1511.6 was only observed in samples expressing SoC28Xyl2 (Figure 12). Furthermore, the MS/MS analysis shows the major fragment ions of peak 11 to be m/z 1379.6 [M - H - X], m/z 955.4 [M - H - FRXX]- and m/z 485.3 [M-H]- of quillaic acid (Figure 12). Overall, these results suggest SoC28Xyl1 and SoC28Xyl2 to be xylose-transferases in S. officinalis. Thus far we have elucidated the genes and enzymes required for the biosynthesis of QA-TriFRXX (11). The steps responsible for the transfer of 4-O-acetylquinovose to 13 remains to be elucidated to complete the biosynthetic pathway to saponarioside B. Although GTs associated with plant natural product biosynthesis typically belong to family 1 of the GT superfamily, none of the UGTs in our main candidate list showed quinovosyltransferase activity towards 11. We therefore expanded our search for candidates by reviewing highly co-expressed genes with SobAS1 and noticed a glycosyl hydrolase family 1 (GH1) candidate exhibiting high level of co-expression (PCC = 0.971) with SobAS1 (Figure 3). We investigated the activity of SoGH1 against 11 using Agrobacterium-mediated transient expression in N. benthamiana. When SoGH1 was co-expressed with biosynthetic genes for 11, two new products (12’ and 12’’) with different RTs but of the same mass ([M-H]- = 1657.7 m/z) corresponding to the anticipated mass of 11 plus deoxyhexose was observed (Figure 13). In attempts to distinguish the two products, we performed tandem MS analysis on 12 and 12’ which produced a same fragmentation pattern. The main fragment ions were 1525.7 m/z ([M-H]- of QA-TriFRXX) and 955.4 m/z ([M-H]- of QA-Tri), which suggested a loss of deoxyhexose, followed by the loss of the entire C-28 sugar chain, resulting in QA-Tri.. We then compared 12 and 12’ with our authentic standard of 3-O-{ -D-xylopyranosyl- - -D-galactopyranosyl- - -D-glucopyranosiduronic acid}-28- O-{ -D-xylopyranosyl- - -D-xylopyranosyl- - -L-rhamnopyranosyl- - -D-quinovopyranosyl- - -D-fucopyranosyl ester}-quillaic acid (12, hereafter abbreviated as QA-TriF(Q)RXX) and observed that although the fragmentation of 12 and 12’ both matched with QA-TriF(Q)RXX, but only 12 had the same RT as QA-TriF(Q)RXX standard. Based on these results, SoGH1 may be involved in transfer of D-quinovse to QA-TriFRXX. With the successful pathway elucidation to 12, only an acetylation step remained to complete the biosynthetic pathway to SpB (13). We screened the functions of BAHD ATs in our main candidate list in Figure 3 by transient expression in N. benthamiana leaves. LC-MS analysis of the resulting leaf extracts revealed that the co-expression of SoBAHD1, in combination with the gene set to produce 12, led to formation of two new products (13 and 13’) with the expected mass corresponding to SpB ([M-H]- = 1699.7 m/z). Furthermore, tandem MS analysis revealed the same fragmentation pattern for both 13 and 13’. The major fragment ions were 1657.7 m/z ([M-H]- of 12) and 955.5 m/z ([M]- of 7), suggesting the fragmentation of an acetyl group, followed by the loss of the entire C-28 sugar chain (Figure 14). However, only 13 produced by heterologous expression of SoBAHD1 corresponded in both RT and fragmentation pattern with authentic SpB standard. Based on these results, we identified 13 as SpB (13) produced by the acetylation of D-quinovose in 12 by SoBAHD1, and SoBAHD1 as an acetyltransferase with the ability to transfer an acetyl moiety to QA-TriF(Q)RXX to produce SpB. The sequence similarity of saponarioside biosynthetic genes identified here and their counterparts in Q. saponaria involved in QS-21 biosynthesis was compared using amino acid sequences (Table 6). Although the first few genes showed high similarity in amino acid sequence, the rest of the pathway genes showed overall low sequence similarity. This suggests that the two pathways have likely established independently and suggests evidence for convergent evolution. The biosynthetic pathway of saponariosides that has been discussed here is illustrated in Figure 15. However, the actual order of the biosynthesis can occur in any order in planta. To investigate the role of the characterized genes in planta, hairy roots were successfully generated from soapwort seedlings. As a proof of concept, we silenced expression of SobAS1 in soapwort hairy roots and compared the metabolic profiles of the SobAS1 silenced hairy roots with DsRED expressing control hairy -amyrin was not detected in both the control and silenced hairy roots (Figure 16), cycloartenol was accumulating in the SobAS1 silenced line only (Figure 17). This may suggest that the silencing of SobAS1 in soapwort hairy roots resulted in the increase flux towards the sterol biosynthetic pathway. Further LC/MS analysis revealed that SobAS1 silenced hairy roots do not accumulate quillaic acid while abundant amount of quillaic acid is detected in the control hairy root line (Figure 18). In agreement with this result, SpB is undetectable in SobAS1 silenced hairy roots while SpB is detected in the control roots (Figure 19). Overall, these results show that SobAS1 is indeed an OSC responsible for f-amyrin biosynthesis in S. officinalis.
Sequences ATGGAACTCTTCTTCATATGTGGACTAGTACTCTTCTCCACCCTATCACTAATATCCCTC
TTCCTCCTCCACAACCACAG TTCTGCTCGGGGGTACAGGCTGCCCCCGGGCAGAATGGGATGGCCCTTCATAGGCGAGTC
ATACGAGTTTTTAGCAAACG GGTGGAAAGGGTACCCGGAAAAGTTTATATTTAGCAGGTTGGCCAAGTATAAACCGAATC
AAGTATTTAAGACGTCGATC CTAGGAGAAAAAGTCGCGGTAATGTGTGGCGCGACATGTAACAAGTTCTTGTTCTCGAAC
GAGGGCAAATTAGTAAATGC TTGGTGGCCGAATTCGGTTAATAAGATCTTCCCTTCTTCTACTCAAACTTCTTCCAAGGA
AGAAGCTAAGAAGATGCGGA AACTTCTCCCTACATTCTTTAAACCCGAGGCACTACAACGATACATACCCATCATGGACG
AAATTGCGATCCGACACATG GAGGACGAATGGGAAGGCAAATCCAAAATCGAAGTATTCCCACTCGCAAAACGCTACACA
TTTTGGCTAGCGTGCCGTCT ATTCCTAAGCATAGACGACCCGGTACACGTAGCCAAATTCGCTGACCCGTTCAACGACAT
TGCCTCAGGGATCATATCGA TCCCAATAGACCTCCCCGGCACACCATTCAACCGGGGAATTAAGGCCTCGAATGTCGTGA
GACAGGAATTGAAGACCATA ATAAAGCAGAGGAAATTGGACCTGTCCGACAACAAGGCGTCCCCGACACAGGATATATTG
TCACACATGTTATTAACTCC CGACGAAGACGGGCGGTATATGAATGAATTGGACATTGCTGATAAAATTCTCGGGTTGTT
AATTGGAGGACATGATACTG CAAGTGCTGCTTGTACTTTTGTTGTGAAGTTTCTTGCTGAACTCCCTCATATTTACGACG
GTGTTTACAAAGAGCAAATG GAGATAGCAAAGTCGAAAAAAGAAGGAGAGCGATTAAATTGGGAGGACATACAAAAGATG
AAATATTCATGGAATGTGGC CTGTGAAGTCATGCGTTTAGCACCTCCTCTTCAAGGCGCTTTTCGTGAAGCCCTCTCTGA
TTTTATGTACGCCGGTTTCC AAATTCCCAAGGGTTGGAAGTTATATTGGAGCGCAAACTCAACACATAGGAACCCAGAAT
GCTTCCCAGAGCCGGAAAAA TTCGACCCAGCAAGGTTCGATGGGAGCGGTCCGGCCCCATACACGTACGTACCGTTCGGA
GGAGGGCCGAGAATGTGCCC AGGAAAAGAGTATGCAAGGCTAGAAATATTGGTGTTCATGCACAACATTGTCAAGAGATT
TAAGTGGGAAAAACTTATTC CTGATGAAACCATTGTTGTTAATCCCATGCCGACCCCGGCTAAAGGCCTACCCGTCCGCC
TTCGTCCTCATTCCAAACCC GTAACTGTATCTGCTTAA SEQ ID NO: 1 SoC28 oxidase (SoC28) nucleotide sequence MELFFICGLVLFSTLSLISLFLLHNHSSARGYRLPPGRMGWPFIGESYEFLANGWKGYPE
KFIFSRLAKYKPNQVFKTSI LGEKVAVMCGATCNKFLFSNEGKLVNAWWPNSVNKIFPSSTQTSSKEEAKKMRKLLPTFF
KPEALQRYIPIMDEIAIRHM EDEWEGKSKIEVFPLAKRYTFWLACRLFLSIDDPVHVAKFADPFNDIASGIISIPIDLPG
TPFNRGIKASNVVRQELKTI IKQRKLDLSDNKASPTQDILSHMLLTPDEDGRYMNELDIADKILGLLIGGHDTASAACTF
VVKFLAELPHIYDGVYKEQM EIAKSKKEGERLNWEDIQKMKYSWNVACEVMRLAPPLQGAFREALSDFMYAGFQIPKGWK
LYWSANSTHRNPECFPEPEK FDPARFDGSGPAPYTYVPFGGGPRMCPGKEYARLEILVFMHNIVKRFKWEKLIPDETIVV
NPMPTPAKGLPVRLRPHSKP VTVSA* SEQ ID NO: 2 SoC28 oxidase (SoC28) amino acid sequence ATGGAGCTAATTACCTTACTAAGTGCTCTTCTTGTTCTTGCTATAGTGAGTTTATCTACA
TTTTTCGTCCTTTACTATAA TACTCCTACTAAGGACGGCAAAACTCTCCCTCCCGGTCGTATGGGCTGGCCTTTTATAGG
CGAGTCCTACGACTTTTTTG CCGCCGGTTGGAAAGGGAAGCCCGAGAGCTTCATTTTCGACCGGTTGAAGAAATTTGCTA
AGGGGAACCTGAACGGTCAG TTCAGGACGAGCTTGTTTGGGAACAAGTCGATTGTGGTGGCGGGGGCTGCTGCTAACAAG
CTTCTTTTCTCGAATGAAAA GAAGCTTGTTACCATGTGGTGGCCCCCGTCTATTGATAAGGCCTTCCCGTCGACTGCACA
GTTGAGTGCGAACGAGGAGG CCTTATTGATGAGGAAGTTTTTTCCTTCTTTTTTGATTAGAAGGGAGGCGCTCCAGCGCT
ACATCCCTATTATGGACGAC TGCACCCGTCGTCACTTCGCGACGGGTGCGTGGGGTCCGTCGGACAAGATCGAGGCCTTC
AATGTGACCCAAGACTACAC GTTTTGGGTCGCCTGCAGAGTCTTCATGAGCATAGACGCTCAGGAAGACCCTGAGACGGT
AGACTCCCTCTTTAGGCACT TTAACGTGCTTAAAGCGGGAATCTACTCAATGCACATCGATCTCCCGTGGACGAACTTCC
ACCACGCGATGAAGGCGTCC CACGCCATCAGGAGCGCCGTGGAGCAAATCGCGAAGAAAAGAAGGGCGGAATTGGCCGAG
GGAAAGGCGTTCCCGACACA AGATATGCTGTCTTACATGCTCGAAACGCCAATTACATCGGCGGAGGATAGCAAGGACGG
GAAAGCGAAGTATTTGAATG ACGCCGATATCGGGACGAAGATACTTGGTCTTCTTGTTGGTGGCCATGACACAAGTAGTA
CAGTTATTGCCTTCTTTTTC AAGTTCATGGCTGAAAATCCTCATGTTTATGAGGCTATTTACAAAGAACAAATGGAGGTA
GCGGCCACAAAAGCGCCGGG GGAGCTTCTAAATTGGGATGACTTGCAGAAAATGAAGTACTCGTGGTGTGCGATTTGCGA
GGTTATGCGTTTGACTCCCC CTGTCCAAGGCGCCTTTCGCCAAGCCATCACCGACTTCACCCATAATGGTTACCTTATTC
CCAAGGGTTGGAAGATATAC TGGAGTACACACTCAACACACAGAAATCCCGAAATCTTCCCACAACCAGAGAAATTCGAC
CCAACAAGATTCGAAGGAAA CGGGCCACCAGCGTTCTCATTCGTGCCATTCGGAGGAGGCCCGAGAATGTGTCCGGGTAA
AGAATATGCAAGGCTACAAG TGCTTACATTTGTGCACCACATTGTGACCAAATTCAAGTGGGAACAAATTCTACCTAATG
AAAAGATCATTGTTAGCCCT ATGCCGTACCCGGAGAAGAATCTTCCGCTTCGTATGATTGCTCGGTCTGAATCCGCCACC
CTCGCTTAA SEQ ID NO: 3 C28C16 oxidase (SoC28C16) nucleotide sequence MELITLLSALLVLAIVSLSTFFVLYYNTPTKDGKTLPPGRMGWPFIGESYDFFAAGWKGK
PESFIFDRLKKFAKGNLNGQ FRTSLFGNKSIVVAGAAANKLLFSNEKKLVTMWWPPSIDKAFPSTAQLSANEEALLMRKF
FPSFLIRREALQRYIPIMDD CTRRHFATGAWGPSDKIEAFNVTQDYTFWVACRVFMSIDAQEDPETVDSLFRHFNVLKAG
IYSMHIDLPWTNFHHAMKAS HAIRSAVEQIAKKRRAELAEGKAFPTQDMLSYMLETPITSAEDSKDGKAKYLNDADIGTK
ILGLLVGGHDTSSTVIAFFF KFMAENPHVYEAIYKEQMEVAATKAPGELLNWDDLQKMKYSWCAICEVMRLTPPVQGAFR
QAITDFTHNGYLIPKGWKIY WSTHSTHRNPEIFPQPEKFDPTRFEGNGPPAFSFVPFGGGPRMCPGKEYARLQVLTFVHH
IVTKFKWEQILPNEKIIVSP MPYPEKNLPLRMIARSESATLA* SEQ ID NO: 4 C28C16 oxidase (SoC28C16) amino acid sequence ATGGAGTATTTGCCGTACATTGCAACATCAATTGCGTGCATAGTAATACTAAGATGGGCA
TTGAACATGATGCAATGGCT ATGGTTCGAACCGAGGCGGTTGGAGAAATTACTTAGAAAACAAGGACTTCAAGGAAATTC
ATATAAGTTTTTATTTGGAG ATATGAAGGAAAGTTCTATGTTGAGAAATGAAGCTTTAGCAAAGCCTATGCCTATGCCTT
TTGATAATGACTACTTTCCT CGTATTAATCCTTTTGTTGATCAACTTCTTAACAAATATGGTATGAATTGTTTCTTGTGG
ATGGGGCCTGTTCCGGCTAT TCAAATCGGAGAACCAGAGTTAGTTAGGGAAGCTTTCAACCGGATGCACGAGTTTCAAAA
GCCCAAAACTAACCCTTTGA GTGCTTTACTCGCCACCGGACTTGTTAGCTACGAGGGCGACAAATGGGCCAAGCACCGCC
GCCTTATCAACCCCTCTTTT CATGTTGAAAAGCTCAAGCTTATGATTCCTGCATTCCGCGAGAGCATTGTGGAGGTGGTC
AATCAATGGGAGAAGAAAGT ACCTGAAAACGGCTCTGCTGAAATAGATGTATGGCCGTCTCTTACTAGTTTAACCGGAGA
TGTTATCTCAAGAGCTGCCT TTGGCAGCGTGTATGGCGATGGAAGAAGGATTTTCGAACTTCTAGCTGTTCAGAAAGAAC
TCGTTTTAAGTCTGCTCAAG TTTTCGTACATCCCTGGATACACGTATTTGCCAACAGAGGGAAACAAGAAGATGAAGGCG
GTGAACAATGAGATACAAAG ACTACTCGAAAACGTGATTCAAAACAGAAAGAAGGCGATGGAAGCCGGAGAAGCAGCAAA
AGATGATCTGTTGGGTTTAC TGATGGATTCCAATTACAAGGAGAGTATGCTTGAAGGCGGCGGGAAAAACAAAAAATTGA
TCATGAGTTTTCAAGATCTT ATTGACGAGTGTAAGCTCTTCTTCTTAGCTGGGCACGAGACGACTGCTGTGTTACTTGTG
TGGACTTTGATTTTGTTGTG TAAGCACCAAGACTGGCAAACCAAAGCTCGCGAAGAAGTTTTGGCTACTTTTGGAATGTC
GGAACCCACTGATTATGATG CCTTAAACCGTCTCAAGATTGTGACAATGATACTAAATGAGGTCCTAAGATTGTACCCAC
CGGTTGTTTCAACCAACCGA AAACTATTCAAGGGCGAAACAAAACTCGGAAACTTGGTAATACCACCAGGTGTCGGTATC
TCACTATTAACCATCCAAGC AAACCGTGACCCGAAAGTTTGGGGGGAGGATGCAAGTGAGTTCCGACCTGATAGATTTGC
AGAAGGGCTAGTGAAGGCGA CTAAGGGCAATGTCGCGTTTTTCCCCTTCGGTTGGGGTCCTAGGATTTGTATTGGCCAAA
ATTTTGCGCTGACCGAGTCA AAGATGGCGGTTGCTATGATATTGCAACGCTTCACTTTCGACCTTTCACCGTCTTACACT
CATGCTCCGTCGGGCCTTAT TACTCTTAACCCGCAATATGGGGCTCCTCTCATGTTTCGTAGACGTTAA SEQ ID NO: 5 SoC23 oxidase (SoC23) nucleotide sequence MEYLPYIATSIACIVILRWALNMMQWLWFEPRRLEKLLRKQGLQGNSYKFLFGDMKESSM
LRNEALAKPMPMPFDNDYFP RINPFVDQLLNKYGMNCFLWMGPVPAIQIGEPELVREAFNRMHEFQKPKTNPLSALLATG
LVSYEGDKWAKHRRLINPSF HVEKLKLMIPAFRESIVEVVNQWEKKVPENGSAEIDVWPSLTSLTGDVISRAAFGSVYGD
GRRIFELLAVQKELVLSLLK FSYIPGYTYLPTEGNKKMKAVNNEIQRLLENVIQNRKKAMEAGEAAKDDLLGLLMDSNYK
ESMLEGGGKNKKLIMSFQDL IDECKLFFLAGHETTAVLLVWTLILLCKHQDWQTKAREEVLATFGMSEPTDYDALNRLKI
VTMILNEVLRLYPPVVSTNR KLFKGETKLGNLVIPPGVGISLLTIQANRDPKVWGEDASEFRPDRFAEGLVKATKGNVAF
FPFGWGPRICIGQNFALTES KMAVAMILQRFTFDLSPSYTHAPSGLITLNPQYGAPLMFRRR* SEQ ID NO: 6 SoC23 oxidase (SoC23) amino acid sequence ATGTGGAGGTTAAAAATAGCAGAAGGTGGAAATGACCCGTATTTGTATAGCACAAACAAT
TTTGTAGGACGTCAAACTTG GGAATTTGATAGCGAGTACGGTACTCCTGAAGCTATAAAAGAAGTAGAAGAAGCTCGACA
AATTTTTTACAAAAATCGAT TTCAAGTTAAGCCTTGTGGCGATCTTCTATGGCGTTTTCAGTTCCTAAGAGAGAAAAACT
TCAAGCAAACAATACCGCAA GTGAAGGTGGGTGATGGGGAGGAGGTCACCTACGAAGCCGCCTCAACGACGTTAAAGCGT
TCCGTCAACTTACTCACGGC CCTGCAGGCCGACGACGGTCACTGGCCTGCTGAAATTGCTGGCCCTCAATTTTTCCTCCC
TCCTTTGGTGTTTTGCTTGT ACATCACCGGACATCTCAACGTTGTTTTCAATGTTCATCACCGTGAAGAAATTCTTCGTA
GCATTTATTATCACCAGAAT GAGGATGGAGGGTGGGGGTTGCACATTGAAGGACACAGCACCATGTTCTGTACGGCGTTG
AACTACATATGTTTGCGGAT GCTAGGAGTCGGTCCTGATGAAGGAGACGACAACGCTTGCCCTAGGGCTCGTAAATGGAT
CCTCGACCATGGTAGTGTCA CTCATATCCCTTCTTGGGGAAAGACTTGGCTTTCTATACTCGGTTTGTTTGATTGGTCCG
GAAGTAACCCGATGCCACCT GAGTTTTGGATTCTGCCTACTTTCATGCCTATGTATCCAGCGAAAATGTGGTGTTACTGT
CGAATGGTGTACATGCCGAT GTCGTACTTATACGGGAAGAGGTTCGTTGGTCCGATTACACCTCTAATCAAACAGCTCAG
AGAGGAACTTTTCAGTGAAC CGTTTGAAGAAATCAAGTGGAAAAAAGTCCGTCATCTGTGTGCACCGGAGGATCTCTACT
ACCCGCATCCATTGATTCAA GACTTAATGTGGGACAGTCTTTACTTATTCACCGAGCCTCTTCTTACTCGCTGGCCGTTC
AACAATTTGATACGACAGAA GGCCTTACAAGTGACGATGGATCATATACATTACGAAGATGAGAACAGTCGATACATAAC
CATAGGATGCGTTGAAAAGG TTTTGTGTATGTTGGCCTGTTGGGTTGAAGACCCAAATGGTGTTTGTTACAAAAAACATC
TTGCTAGAGTTCCCGATTAT ATATGGATTGCCGAGGATGGCCTTAAAATGCAGAGTTTTGGAAGTCAACAGTGGGACTGT
GGCTTTGCTGTGCAAGCATT ACTAGCTTCGAATATGAGTCTTGATGAAATCGGACCTGCCCTTAAGAAAGGCCACTTCTT
TATCAAAGAGTCTCAGGTGA AAGATAATCCCTCGGGTGATTTCAAGAGCATGCACCGTCATATCTCGAAGGGATCGTGGA
CGTTTTCTGACCAAGATCAT GGTTGGCAGGTCTCTGACTGCACTGCAGAAGGCCTTAAGTGCTGCTTGATCTTATCAACC
ATGCCGCCAGAAATTGTTGG AGAAAAGATGGACCCTGAGAGGCTCTACGACTCTGTCAATGTCCTGCTTTCTCTACAGAG
TGAAAATGGAGGTCTATCTG CTTGGGAACCAGCTGGAGCACAAGCTTGGTTAGAGCTTCTAAATCCAACGGAATTCTTCG
CAGACATTGTGATCGAGCAT GAGTATGTTGAATGTACTGGTGCATCAATTCAAGCTCTGGTATTATTCAAGAAAATGTAC
CCTGGTCACCGAAAGAAAGA GATCGAAAATTTCATAGCCAAGGCCGCGAAATACCTCGAGGACACCCAATATCCAAACGG
CTCTTGGTATGGAAATTGGG GTGTGTGTTTCACGTATGGGACGTGGTTTGCGCTAGGAGGGCTAGCGGCAGCGGGCAAAA
CATACGCGAATTGTGCTGCG ATGCGAAAAGGTGTTGAATTCCTTCTTAAGTCACAAAAGGAGGACGGTGGGTGGGGCGAA
AGCTATGTTTCATGCCCGAA AAAGGACTTCGTGCCGCTGGAAGGACCATCCAATCTAACTCAAACCGCATGGGCGTTGAT
GGGTCTAATTTACGCACGAC AGATGGAGAGGGATCCGACACCGCTACACCAAGCAGCAAAGCTTTTGATCAATTCACAAC
TCGAAAACGGAGATTTCCCT CAACAGGAAATAACAGGAGTATTCATGAAGAATTGCATGCTACACTATCCAATGTACAGG
ACTATTTATCCACTGTGGGC TATTGCAGAATATAGGACGCATGTTCCTTTGAGGCTTAGTTAA SEQ ID NO: 7 bAS (SobAS) nucleotide sequence MWRLKIAEGGNDPYLYSTNNFVGRQTWEFDSEYGTPEAIKEVEEARQIFYKNRFQVKPCG
DLLWRFQFLREKNFKQTIPQ VKVGDGEEVTYEAASTTLKRSVNLLTALQADDGHWPAEIAGPQFFLPPLVFCLYITGHLN
VVFNVHHREEILRSIYYHQN EDGGWGLHIEGHSTMFCTALNYICLRMLGVGPDEGDDNACPRARKWILDHGSVTHIPSWG
KTWLSILGLFDWSGSNPMPP EFWILPTFMPMYPAKMWCYCRMVYMPMSYLYGKRFVGPITPLIKQLREELFSEPFEEIKW
KKVRHLCAPEDLYYPHPLIQ DLMWDSLYLFTEPLLTRWPFNNLIRQKALQVTMDHIHYEDENSRYITIGCVEKVLCMLAC
WVEDPNGVCYKKHLARVPDY IWIAEDGLKMQSFGSQQWDCGFAVQALLASNMSLDEIGPALKKGHFFIKESQVKDNPSGD
FKSMHRHISKGSWTFSDQDH GWQVSDCTAEGLKCCLILSTMPPEIVGEKMDPERLYDSVNVLLSLQSENGGLSAWEPAGA
QAWLELLNPTEFFADIVIEH EYVECTGASIQALVLFKKMYPGHRKKEIENFIAKAAKYLEDTQYPNGSWYGNWGVCFTYG
TWFALGGLAAAGKTYANCAA MRKGVEFLLKSQKEDGGWGESYVSCPKKDFVPLEGPSNLTQTAWALMGLIYARQMERDPT
PLHQAAKLLINSQLENGDFP QQEITGVFMKNCMLHYPMYRTIYPLWAIAEYRTHVPLRLS* SEQ ID NO: 8 bAS (SobAS) amino acid sequence ATGTCACCCCACAACACCTGCACTCTACAAATAACCCGAGCCCTCCTCAGCCGCCTCCAC
ATCCTCTTCCACTCCGCCCT CGTCGCCTCCGTCTTCTACTACCGCTTTTCCAACTTCTCCTCTGGCCCGGCATGGGCCCT
CATGACTTTCGCCGAGCTCA CCCTCGCCTTCATCTGGGCCCTCACCCAGGCCTTCCGCTGGCGGCCCGTCGTCCGGGCCG
TCTTCGGGCCCGAGGAGATT GACCCGGCCCAGCTCCCGGGTCTGGACGTGTTCATATGCACGGCAGACCCGAGGAAGGAG
CCGGTGATGGAGGTGATGAA CTCGGTGGTGTCGGCATTGGCGTTGGATTATCCGGCAGAGAAGCTGGCGGTTTACTTGTC
GGACGACGGCGGGTCGCCCT TGACTAGGGAGGTTATTAGGGAGGCTGCCGTGTTTGGGAAGTACTGGGTCGGGTTTTGTG
GGAAGTATAATGTTAAGACG AGGTGTCCTGAGGCCTATTTTAGTTCGTTTTGTGATGGTGAAAGAGTTGATCATAATCAG
GATTATTTGAACGACGAGCT TTCCGTCAAGTCGAAATTTGAAGCGTTTAAGAAGTATGTGCAAAAAGCAAGTGAAGACGC
CACCAAATGTATTGTTGTCA ATGATCGTCCTTCTTGTGTTGAGATTATTCATGACAGCAAGCAGAACGGAGAGGGTGAAG
TGAAAATGCCGCTTCTTGTT TACGTAGCCAGGGAAAAAAGACCGGGTTTTAATCACCATGCTAAAGCCGGAGCCATTAAT
ACACTTCTTCGAGTGTCGGG TTTACTGAGCAATAGCCCTTTCTTTTTGGTGTTGGATTGTGATATGTACTGTAATGATCC
AACGTCTGCGCGTCAAGCTA TGTGCTTCCATCTTGACCCGAAACTAGCTCCCTCTCTCGCGTTTGTGCAATACCCTCAAA
TTTTCTACAACACCAGCAAA AACGACATCTATGATGGTCAGGCCAGAGCAGCTTTTAAGACTAAATATCAAGGCATGGAT
GGTCTTAGAGGGCCGGTTAT GAGTGGCACGGGGTATTTCTTGAAGAGGAAAGCATTGTACGGAAAACCACACGACCAAGA
TGAATTACTCAGGGAGCAGC CAACGAAGGCCTTTGGCTCCTCTAAGATATTCATCGCGTCCCTTGGTGAAAATACCTGTG
TTGCCTTGAAAGGATTGAGT AAAGACGAGTTGTTGCAAGAGACTCAAAAATTGGCTGCTTGTACATACGAATCAAACACG
TTATGGGGTAGCGAGGTTGG ATACTCGTACGACTGCTTGTTGGAGAGCACATACTGTGGGTACTTATTACACTGCAAAGG
ATGGATCTCAGTATATCTAT ACCCGAAAAAGCCGTGTTTCTTGGGGTGTGCAACAGTGGACATGAATGATGCCATGCTTC
AGATAATGAAATGGACTTCT GGATTGATTGGCGTTGGCATATCAAAGTTCAGCCCGTTCACATACGCCATGTCTCGGATC
TCCATTATGCAAAGTCTTTG CTATGCTTACTTCGCTTTTTCGGGCCTATTTGCTGTCTTCTTCTTGATCTATGGCGTTGT
TCTTCCGTATTCCCTCTTGC AGGGTGTTCCGCTCTTCCCCAAGGCAGGAGATCCATGGCTTTTGGCATTTGCGGGAGTAT
TCATATCCTCGCTTCTTCAG CACCTGTACGAGGTTCTCTCAAGCGGAGAAACAGTGAAAGCGTGGTGGAACGAGCAAAGA
ATCTGGATCATAAAATCAAT CACCGCCTGTCTGTTTGGTCTTCTGGACGCTATGCTTAACAAAATTGGCGTCTTAAAGGC
TAGTTTCAGACTGACAAACA AGGCTGTCGACAAACAAAAACTCGATAAATACGAGAAGGGCAGGTTCGATTTCCAAGGCG
CACAAATGTTCATGGTCCCT CTCATGATTCTGGTGGTATTCAATTTGGTCTCGTTCTTTGGCGGCTTAAGAAGAACCGTC
ATTCATAAAAACTACGAAGA CATGTTCGCGCAGCTTTTCCTCTCGTTGTTCATTCTAGCTCTTAGCTATCCTATCATGGA
GGAGATTGTCCGAAAAGCTA GAAAAGGTCGCTCTTAA SEQ ID NO: 9 SoQA-GlcAT (SoCSL) nucleotide sequence MSPHNTCTLQITRALLSRLHILFHSALVASVFYYRFSNFSSGPAWALMTFAELTLAFIWA
LTQAFRWRPVVRAVFGPEEI DPAQLPGLDVFICTADPRKEPVMEVMNSVVSALALDYPAEKLAVYLSDDGGSPLTREVIR
EAAVFGKYWVGFCGKYNVKT RCPEAYFSSFCDGERVDHNQDYLNDELSVKSKFEAFKKYVQKASEDATKCIVVNDRPSCV
EIIHDSKQNGEGEVKMPLLV YVAREKRPGFNHHAKAGAINTLLRVSGLLSNSPFFLVLDCDMYCNDPTSARQAMCFHLDP
KLAPSLAFVQYPQIFYNTSK NDIYDGQARAAFKTKYQGMDGLRGPVMSGTGYFLKRKALYGKPHDQDELLREQPTKAFGS
SKIFIASLGENTCVALKGLS KDELLQETQKLAACTYESNTLWGSEVGYSYDCLLESTYCGYLLHCKGWISVYLYPKKPCF
LGCATVDMNDAMLQIMKWTS GLIGVGISKFSPFTYAMSRISIMQSLCYAYFAFSGLFAVFFLIYGVVLPYSLLQGVPLFP
KAGDPWLLAFAGVFISSLLQ HLYEVLSSGETVKAWWNEQRIWIIKSITACLFGLLDAMLNKIGVLKASFRLTNKAVDKQK
LDKYEKGRFDFQGAQMFMVP LMILVVFNLVSFFGGLRRTVIHKNYEDMFAQLFLSLFILALSYPIMEEIVRKARKGRS* SEQ ID NO: 10 SoQA-GlcAT (SoCSL) amino acid sequence ATGGGTTCAAATACAGAAGCAACTGAAATACCCAAAATGCCCTTGAAAATAGTCTTCCTT
ACACTTCCTATAGCCGGACA CATGCTCCACATTGTAGACACCGCAAGCACATTTGCCATACATGGAGTCGAGTGTACCAT
AATCACTACCCCTGCAAATG TCCCTTTCATCGAAAAATCAATCTCTGCAACCAACACCACAATTCGACAGTTCCTCAGTA
TCCGCCTCGTCGATTTCCCC CATGAAGCTGTCGGCCTTCCTCCCGGTGTCGAAAACTTCAGTGCAGTCACGTGTCCGGAT
ATGAGACCCAAAATATCGAA AGGACTTTCGATCATACAAAAACCAACTGAAGACTTAATCAAGGAAATATCACCTGATTG
TATTGTTTCTGACATGTTTT ACCCTTGGACTTCTGATTTCGCCCTTGAAATAGGTGTTCCAAGGGTGGTTTTTCGCGGTT
GTGGGATGTTTCCCATGTGT TGTTGGCATAGTATTAAGTCACATTTACCACATGAGAAGGTTGACAGAGATGATGAAATG
ATTGTTCTTCCTACATTGCC TGATCATATAGAGATGAGAAAATCTACATTACCTGATTGGGTAAGGAAACCAACTGGGTA
CAGTTATTTGATGAAGATGA TTGATGCGGCCGAATTGAAGAGTTATGGAGTAATTGTTAATAGTTTTAGTGATTTAGAGA
GGGATTATGAGGAGTATTTT AAGAATGTCACCGGGTTAAAGGTGTGGACCGTCGGTCCGATTTCGTTACATGTGGGTCGG
AATGAGGAGTTAGAAGGGTC AGATGAGTGGGTCAAATGGCTAGATGGGAAAAAACTAGACTCGGTTATTTATGTTAGTTT
TGGTGGGGTGGCGAAGTTTC CACCCCACCAGCTGAGAGAAATCGCGGCCGGATTAGAATCATCTGGCCACGATTTTGTTT
GGGTGGTGAGGGCGAGTGAC GAAAATGGCGACCAAGCTGAAGCGGATGAGTGGTCCCTACAAAAATTTAAAGAGAAAATG
AAGAAAACTAACCATGGGTT GGTTATAGAGAGTTGGGTCCCACAACTTATGTTTTTGGAACATAAGGCTATCGGAGGAAT
GTTGACACATGTTGGTTGGG GTACAATGTTGGAAGGGATTACAGCGGGTTTACCGTTGGTGACGTGGCCATTGTATGCCG
AGCAGTTTTACAATGAGAGG TTGGTGGTTGATGTGTTGAAGATTGGAGTTGGTGTTGGGGTGAAAGAGTTCTGTGGGTTG
GATGATATTGGCAAGAAGGA GACCATTGGTAGGGAGAATATCGAGGCATCGGTGAGATTAGTGATGGGCGATGGCGAGGA
GGCGGCTGCCATGAGACTGC GGGTGAAGGAGTTGAGTGAGGCGTCTATGAAGGCGGTTCGAGAAGGTGGTTCATCTAAGG
CTAATATACACGATTTCCTT AACGAGCTGTCTACGTTGAGATCGTTAAGGCAGGCTTGA SEQ ID NO: 11 QA-GalT (SoC3Gal) nucleotide sequence MGSNTEATEIPKMPLKIVFLTLPIAGHMLHIVDTASTFAIHGVECTIITTPANVPFIEKS
ISATNTTIRQFLSIRLVDFP HEAVGLPPGVENFSAVTCPDMRPKISKGLSIIQKPTEDLIKEISPDCIVSDMFYPWTSDF
ALEIGVPRVVFRGCGMFPMC CWHSIKSHLPHEKVDRDDEMIVLPTLPDHIEMRKSTLPDWVRKPTGYSYLMKMIDAAELK
SYGVIVNSFSDLERDYEEYF KNVTGLKVWTVGPISLHVGRNEELEGSDEWVKWLDGKKLDSVIYVSFGGVAKFPPHQLRE
IAAGLESSGHDFVWVVRASD ENGDQAEADEWSLQKFKEKMKKTNHGLVIESWVPQLMFLEHKAIGGMLTHVGWGTMLEGI
TAGLPLVTWPLYAEQFYNER LVVDVLKIGVGVGVKEFCGLDDIGKKETIGRENIEASVRLVMGDGEEAAAMRLRVKELSE
ASMKAVREGGSSKANIHDFL NELSTLRSLRQA* SEQ ID NO: 12 QA-GalT (SoC3Gal) amino acid sequence ATGAAGTCACCACTAAAGTTGTACTTCCTGCCATACATATCACCAGGCCATATGATCCCA
CTTTCCGAAATGGCTCGGTT ATTCGCCAACCAAGGGCACCACGTGACCATCATCACCACCACCTCGAACGCCACCCTCCT
CCAAAAATACACCACCGCCA CCCTGTCTCTACATCTTATTCCCCTCCCTACCAAAGAGGCCGGCCTTCCAGACGGCCTCG
AAAACTTCATTTCTGTCAAC GATCTTGAAACCGCTGGCAAACTCTACTACGCTCTTTCCCTCCTGCAACCCGTCATTGAG
GAGTTTATCACGTCTAACCC GCCCGATTGTATCGTGTCCGACATGTTCTATCCCTGGACTGCGGACCTGGCGTCCCAACT
CCAGGTCCCGCGTATGGTCT TTCATGCAGCGTGTATATTCGCTATGTGCATGAAAGAGTCAATGCGGGGCCCTGACGCCC
CGCATCTGAAGGTCAGCTCT GATTATGAGCTGTTTGAAGTCAAGGGGCTACCGGACCCGGTTTTTATGACCCGGGCCCAG
CTCCCTGACTACGTGCGTAC CCCAAACGGGTACACACAGCTCATGGAGATGTGGCGAGAAGCGGAAAAGAAAAGTTACGG
TGTTATGGTTAATAATTTTT ACGAACTTGACCCGGCTTATACCGAGCATTATAGTAAGATTATGGGCCATAAGGTCTGGA
ATATTGGGCCTGCGGCCCAA ATTCTTCACCGTGGTTCTGGTGATAAAATCGAGAGGGTTCACAAAGCCGTTGTTGGTGAA
AACCAATGCTTGAGTTGGCT CGACACTAAGGAACCTAACTCGGTTTTTTACGTCTGCTTTGGGAGCGCGATTAGGTTCCC
TGATGATCAGCTCTACGAAA TTGCTAGCGCGCTAGAATCATCTGGCGCGCAGTTTATATGGGCCGTTCTTGGAAAAGACT
CGGATAATTCAGACTCGAAC TCAGACTCAGAATGGCTGCCTGCAGGGTTCGAGGAAAAAATGAAGGAAACGGGTAGAGGG
ATGATAATACGAGGTTGGGC CCCACAGGTGTTGATATTGGACCACCCGTCTGTAGGCGGGTTTATGACTCACTGTGGCTG
GAACTCGACAATTGAGGGGG TTAGCGCGGGGGTGGGGATGGTGACATGGCCGTTGTATGCGGAACAATTTTACAATGAGA
AGTTAATAACACAAGTGCTT AAGATAGGGGTGGAGGCCGGGGTGGAGGAGTGGAACTTGTGGGTGGATGTTGGGAGGAAA
TTGGTGAAGAGAGAGAAGAT CGAGGCGGCAATTAGGGCGGTGATGGGTGAGGCCGGGGTGGAGATGAGGAGGAAGGCGAA
AGAGTTGAGTGTCAAGGCTA AGAAGGCGGTGCAGGATGGTGGGTCGTCTCACCGTAATTTAATGGCTTTGATCGAAGATC
TGCAGAGGATTAGAGATGAT AAAATGAGTAAGGTTGCTAATTAG SEQ ID NO: 13 SoQA-R XylT (SoC3Xyl) nucleotide sequence MKSPLKLYFLPYISPGHMIPLSEMARLFANQGHHVTIITTTSNATLLQKYTTATLSLHLI
PLPTKEAGLPDGLENFISVN DLETAGKLYYALSLLQPVIEEFITSNPPDCIVSDMFYPWTADLASQLQVPRMVFHAACIF
AMCMKESMRGPDAPHLKVSS DYELFEVKGLPDPVFMTRAQLPDYVRTPNGYTQLMEMWREAEKKSYGVMVNNFYELDPAY
TEHYSKIMGHKVWNIGPAAQ ILHRGSGDKIERVHKAVVGENQCLSWLDTKEPNSVFYVCFGSAIRFPDDQLYEIASALES
SGAQFIWAVLGKDSDNSDSN SDSEWLPAGFEEKMKETGRGMIIRGWAPQVLILDHPSVGGFMTHCGWNSTIEGVSAGVGM
VTWPLYAEQFYNEKLITQVL KIGVEAGVEEWNLWVDVGRKLVKREKIEAAIRAVMGEAGVEMRRKAKELSVKAKKAVQDG
GSSHRNLMALIEDLQRIRDD KMSKVAN* SEQ ID NO: 14 SoQA-R XylT (SoC3Xyl) amino acid sequence ATGTCGGATCAAAATGATAAAAAGGTCGAAATAATAGTATTTCCATACCATGGCCAAGGT
CACATGAACACCATGCTACA ATTCGCCAAACGAATTGCGTGGAAAAACGCCAAAGTTACAATCGCTACGACATTGTCCAC
CACTAATAAAATGAAGTCCA AGGTCGAGAATGCCTGGGGCACTTCTATAACCTTGGACTCCATTTACGATGACTCTGACG
AGTCGCAGATAAAATTCATG GACCGTATGGCCAGGTTTGAGGCTGCTGCAGCCTCGAGCCTGTCCAAACTCCTGGTCCAG
AAAAAAGAAGAAGCTGACAA CAAAGTCTTGTTGGTTTACGACGGGAATTTGCCGTGGGCGCTGGATATCGCCCACGAGCA
TGGCGTGCGTGGGGCCGCGT TTTTTCCACAGTCGTGTGCGACGGTCGCCACGTACTACTCGTTGTATCAAGAGACGCAGG
GGAAGGAGCTAGAGACGGAG TTGCCGGCGGTGTTTCCGCCGTTGGAGTTGATACAACGGAATGTACCGAATGTGTTTGGA
TTGAAGTTTCCGGAGGCGGT TGTGGCTAAGAATGGGAAGGAGTATAGTCCTTTTGTGTTGTTTGTGTTGAGGCAGTGTAT
TAACCTTGAGAAGGCTGATT TGCTGCTTTTCAATCAGTTTGATAAGTTGGTTGAACCTGGGGAGGTTCTGCAATGGATGT
CGAAGATATTCAACGTAAAG ACAATCGGACCGACACTTCCATCTTCATACATCGACAAACGAATCAAAGACGACGTGGAC
TACGGTTTCCACGCATTCAA CCTCGACAACAACTCCTGCATCAATTGGCTTAACTCCAAACCCGCTCGCTCTGTCATCTA
CATAGCATTTGGGAGCAGCG TCCACTACAGCGTTGAGCAAATGACCGAAATAGCCGAGGCCTTAAAGAGCCAACCGAACA
ATTTCCTTTGGGCAGTCCGA GAAACCGAACAAAAGAAACTCCCTGAAGACTTCGTCCAACAAACCTCGGAAAAAGGGTTA
ATGCTCTCATGGTGCCCTCA ATTAGATGTTTTGGTGCATGAATCAATCAGTTGTTTTGTGACACATTGTGGTTGGAACTC
GATTACAGAGGCACTTAGCT TCGGGGTACCAATGCTGTCAGTGCCACAGTTTTTGGACCAGCCTGTTGATGCTCACTTTG
TGGAACAGGTTTGGGGTGCT GGAATTACGGTCAAGAGGAGCGAAGACGGTTTGGTTACTCGAGACGAAATTGTTCGGTGC
TTGGAGGTGTTAAATAATGG CGAAAAGGCGGAGGAAATTAAGGCGAATGTGGCGAGGTGGAAGGTTTTGGCTAAGGAAGC
TTTGGATGAAGGTGGTAGTT CTGATAAGCACATTGACGAAATTATTGAGTGGGTTTCATCTTTCTAA SEQ ID NO: 15 QATriFuT (SoC28F) nucleotide sequence MSDQNDKKVEIIVFPYHGQGHMNTMLQFAKRIAWKNAKVTIATTLSTTNKMKSKVENAWG
TSITLDSIYDDSDESQIKFM DRMARFEAAAASSLSKLLVQKKEEADNKVLLVYDGNLPWALDIAHEHGVRGAAFFPQSCA
TVATYYSLYQETQGKELETE LPAVFPPLELIQRNVPNVFGLKFPEAVVAKNGKEYSPFVLFVLRQCINLEKADLLLFNQF
DKLVEPGEVLQWMSKIFNVK TIGPTLPSSYIDKRIKDDVDYGFHAFNLDNNSCINWLNSKPARSVIYIAFGSSVHYSVEQ
MTEIAEALKSQPNNFLWAVR ETEQKKLPEDFVQQTSEKGLMLSWCPQLDVLVHESISCFVTHCGWNSITEALSFGVPMLS
VPQFLDQPVDAHFVEQVWGA GITVKRSEDGLVTRDEIVRCLEVLNNGEKAEEIKANVARWKVLAKEALDEGGSSDKHIDE
IIEWVSSF* SEQ ID NO: 16 QATriFuT (SoC28F) amino acid sequence ATGTCTGCCAAAATGTTGCACGTAGTTATGTACCCATGGTTCGCATACGGTCACATGATC
CCATTTTTACATTTATCGAA CAAATTAGCCGAAACCGGTCACAAAGTCACGTACATACTCCCCCCAAAAGCGCTAACCCG
CTTACAAAACCTCAACCTAA ATCCGACCCAAATCACGTTCCGGACCATCACGGTCCCCCGAGTTGATGGGTTACCCGCTG
GTGCCGAGAACGTGACCGAT ATTCCGGATATTACTCTGCATACTCATTTGGCCACGGCGCTGGATCGAACCCGACCCGAA
TTTGAGACGATTGTCGAGTT GATTAAGCCGGATGTGATAATGTATGACGTGGCGTATTGGGTGCCAGAGGTGGCGGTGAA
GTATGGGGCGAAGAGTGTTG CGTATAGTGTGGTGTCGGCGGCAAGTGTGTCGCTGAGTAAGACGGTGGTTGATCGGATGA
CGCCGTTGGAGAAACCGATG ACGGAGGAGGAGAGGAAGAAGAAGTTTGCTCAGTATCCTCACTTAATTCAGCTTTATGGT
CCTTTTGGTGAAGGTATCAC CATGTACGACCGTCTAACAGGCATGCTTAGCAAGTGTGACGCTATAGCTTGTAGGACCTG
CCGTGAGATTGAAGGCAAGT ATTGCCAATATTTATCCACTCAATATGAAAAGAAAGTCACCCTTACCGGCCCGGTTCTTC
CCGAGCCGGAAGTCGGGGCC ACACTGGAGGCCCCTTGGTCCGAGTGGCTTAGTCGGTTCAAGCTTGGTTCGGTTTTATTT
TGTGCCTTTGGGAGCCAATT TTACTTGGACAAGGACCAGTTCCAGGAAATCATCCTCGGGCTTGAAATGACAAATTTACC
CTTTCTGATGGCTGTTCAGC CCCCTAAGGGTTGCGCCACTATCGAGGAGGCGTACCCTGAGGGGTTTGCTGAGCGGGTCA
AGGACCGAGGAGTCGTGACA AGCCAGTGGGTGCAACAGCTGGTTATACTGGCCCACCCAGCGGTTGGGTGCTTTGTGAAC
CATTGCGCGTTTGGGACAAT GTGGGAGGCCTTATTGAGCGAAAAGCAGTTGGTGATGATCCCTCAACTAGGTGACCAAAT
ACTGAACACCAAAATGTTGG CCGATGAATTGAAAGTCGGGGTTGAAGTCGAGAGAGGAATCGGTGGGTGGGTGTCTAAGG
AGAATTTGTGTAAGGCGATC AAGTCCGTCATGGACGAGGATAGTGAAATTGGCAAGGACGTGAAACAAAGTCATGAAAAA
TGGAGGGCGACTTTGTCGAG CAAAGATTTAATGTCGACTTATATTGATAGTTTCATCAAAGATTTACAAGCACTCGTCGA
GTGA SEQ ID NO: 17 QA-TriFR (SoC28Rha) nucleotide sequence MSAKMLHVVMYPWFAYGHMIPFLHLSNKLAETGHKVTYILPPKALTRLQNLNLNPTQITF
RTITVPRVDGLPAGAENVTD IPDITLHTHLATALDRTRPEFETIVELIKPDVIMYDVAYWVPEVAVKYGAKSVAYSVVSA
ASVSLSKTVVDRMTPLEKPM TEEERKKKFAQYPHLIQLYGPFGEGITMYDRLTGMLSKCDAIACRTCREIEGKYCQYLST
QYEKKVTLTGPVLPEPEVGA TLEAPWSEWLSRFKLGSVLFCAFGSQFYLDKDQFQEIILGLEMTNLPFLMAVQPPKGCAT
IEEAYPEGFAERVKDRGVVT SQWVQQLVILAHPAVGCFVNHCAFGTMWEALLSEKQLVMIPQLGDQILNTKMLADELKVG
VEVERGIGGWVSKENLCKAI KSVMDEDSEIGKDVKQSHEKWRATLSSKDLMSTYIDSFIKDLQALVE* SEQ ID NO: 18 QA-TriFR (SoC28Rha) amino acid sequence ATGGGTACTAAAGAGTTACACATAGTAATGTACCCATGGCTAGCATTTGGTCATTTCATA
CCATACCTTCATCTCTCTAA CAAACTCGCTCAAAAAGGCCATAAAATCACTTTCTTACTTCCTCATAGAGCCAAACTTCA
ACTTGACTCCCAAAATTTAT ATCCCTCACTTATTACCCTCGTACCAATTACCGTCCCACAGGTCGACACCCTTCCTCTCG
GGGCCGAATCGACTGCTGAT ATCCCCCTTAGTCAGCACGGTGACCTCTCCATCGCCATGGACCGTACTCGACCCGAGATT
GAGTCTATCTTGTCTAAACT TGACCCAAAACCGGACCTGATTTTCTTCGATATGGCGCAGTGGGTGCCTGTCATAGCGTC
TAAGCTTGGGATCAAGTCTG TTTCGTATAATATCGTTTGCGCCATTTCGTTGGACCTTGTTCGAGATTGGTATAAGAAGG
ATGATGGAAGTAATGTGCCT AGTTGGACATTGAAGCATGACAAGTCATCCCATTTCGGGGAGAATATTAGTATTCTCGAG
CGAGCGCTGATTGCGCTCGG GACGCCTGATGCCATAGGCATCAGGTCGTGTCGGGAGATAGAGGGGGAGTACTGTGACAG
CATAGCGGAACGATTTAAGA AACCGGTCTTACTAAGCGGGACGACCTTACCTGAACCATCCGACGACCCACTTGACCCAA
AATGGGTCAAGTGGCTCGGA AAGTTCGAGGAAGGTTCGGTTATTTTTTGCTGCCTAGGGAGTCAGCACGTGTTAGACAAG
CCCCAGCTCCAGGAGCTGGC GCTGGGGCTTGAAATGACGGGGTTGCCATTCTTCCTAGCGATTAAACCACCGCTAGGATA
CGCAACCCTAGACGAGGTAC TACCCGAGGGGTTTTCAGAACGGGTTCGAGATCGAGGGGTGGCTCATGGGGGATGGGTAC
AACAGCCTCAGATGCTGGCA CACCCTTCTGTAGGGTGCTTTTTGTGTCACTGTGGGTCGTCGTCGATGTGGGAGGCATTA
GTGAGTGATACGCAGCTCGT ATTGTTTCCTCAAATACCAGATCAAGCTCTAAACGCGGTTTTAATGGCGGATAAACTTAA
GGTCGGGGTGAAGGTCGAGA GAGAGGACGACGGAGGGGTGTCGAAAGAGGTTTGGAGTAGAGCAATAAAGAGTGTGATGG
ATAAGGAGAGTGAAATTGCT GCGGAAGTGAAGAAGAATCATACTAAGTGGAGAGATATGTTGATTAATGAAGAATTTGTG
AATGGGTACATTGACAGTTT CATTAAGGATCTACAAGATCTTGTTGAGAAGTAG SEQ ID NO: 19 SoQA-TriFRXylT (SoC28Xyl1) nucleotide sequence MGTKELHIVMYPWLAFGHFIPYLHLSNKLAQKGHKITFLLPHRAKLQLDSQNLYPSLITL
VPITVPQVDTLPLGAESTAD IPLSQHGDLSIAMDRTRPEIESILSKLDPKPDLIFFDMAQWVPVIASKLGIKSVSYNIVC
AISLDLVRDWYKKDDGSNVP SWTLKHDKSSHFGENISILERALIALGTPDAIGIRSCREIEGEYCDSIAERFKKPVLLSG
TTLPEPSDDPLDPKWVKWLG KFEEGSVIFCCLGSQHVLDKPQLQELALGLEMTGLPFFLAIKPPLGYATLDEVLPEGFSE
RVRDRGVAHGGWVQQPQMLA HPSVGCFLCHCGSSSMWEALVSDTQLVLFPQIPDQALNAVLMADKLKVGVKVEREDDGGV
SKEVWSRAIKSVMDKESEIA AEVKKNHTKWRDMLINEEFVNGYIDSFIKDLQDLVEK* SEQ ID NO: 20 SoQA-TriFRXylT (SoC28Xyl1) amino acid sequence ATGGAGGAATCAAAGGAGGAAGTACATGTAGCATTCTTCCCATTCATGACACCAGGTCAC
TCAATCCCAATGCTAGACTT GGTACGTTTGTTCATTGCTCGTGGTGTCAAAACTACTGTCTTCACTACTCCTCTTAATGC
TCCTAATATTTCCAAATACC TCAACATTATCCAAGATTCCTCATCAAACAAAAACACCATTTATGTAACTCCTTTTCCTT
CTAAAGAAGCCGGTTTACCG GAAGGTGTGGAAAGCCAGGATAGTACCACTTCCCCCGAAATGACCCTCAAGTTCTTTGTT
GCTATGGAATTACTTCAAGA CCCCCTTGATGTTTTTTTAAAAGAAACCAAACCTCATTGTCTTGTTGCTGATAATTTCTT
CCCTTACGCCACCGACATCG CTTCTAAGTATGGCATTCCTAGGTTTGTTTTTCAGTTCACTGGCTTCTTTCCTATGTCTG
TCATGATGGCCTTAAATCGT TTCCACCCTCAAAACTCTGTATCATCTGATGACGACCCCTTTCTTGTTCCCAGTTTACCC
CATGACATCAAATTGACTAA GTCACAATTGCAACGAGAGTACGAGGGTAGTGATGGTATTGACACCGCTCTTTCTAGGCT
CTGTAATGGCGCCGGTAGAG CTTTGTTTACTAGTTATGGTGTCATTTTTAACAGCTTCTACCAACTCGAACCTGATTATG
TTGATTATTATACCAACACC ATGGGGAAACGATCCAGGGTTTGGCATGTGGGCCCAGTGTCGTTATGCAACCGTCGACAC
GTGGAGGGTAAATCTGGTAG GGGGAGAAGTGCTTCAATTAGTGAGCATTTGTGCTTAGAGTGGCTCAATGCCAAAGAACC
AAATTCAGTGATATATGTAT GTTTTGGTAGTCTCACATGTTTCTCCAATGAGCAACTCAAAGAAATCGCAACCGCCTTAG
AAAGGTGTGAAGAGTATTTT ATATGGGTGTTGAAGGGTGGCAAAGATAATGAGCAAGAGTGGTTGCCACAAGGGTTTGAA
GAGAGGGTTGAAGGGAAAGG ACTAATCATACGGGGGTGGGCCCCACAAGTGTTGATTTTAGACCATGAAGCCATAGGCGG
GTTTGTGACACACTGTGGTT GGAACTCGACACTAGAAAGTATATCAGCGGGGGTGCCCATGGTGACATGGCCCATATATG
CAGAGCAATTTTATAATGAG AAATTGGTGACGGATGTACTGAAGGTGGGGGTTAAAGTAGGGTCAATGAAGTGGAGTGAG
ACGACGGGGGCGACTCATTT AAAGCATGAGGAAATAGAAAAAGCATTGAAGCAAATAATGGTGGGAGAAGAGGTGTTAGA
GATGAGAAAAAGAGCAAGTA AGTTGAAAGAGATGGCTTATAATGCTGTTGAAGAAGGAGGCTCTTCTTATTCTCACCTCA
CTTCCTTAATCGACGACCTT ATGGCTTCCAAAGCTGTGCTACAAAAATTTTGA SEQ ID NO: 21 SoQA-TriFRXXylT (SoC28Xyl2) nucleotide sequence MEESKEEVHVAFFPFMTPGHSIPMLDLVRLFIARGVKTTVFTTPLNAPNISKYLNIIQDS
SSNKNTIYVTPFPSKEAGLP EGVESQDSTTSPEMTLKFFVAMELLQDPLDVFLKETKPHCLVADNFFPYATDIASKYGIP
RFVFQFTGFFPMSVMMALNR FHPQNSVSSDDDPFLVPSLPHDIKLTKSQLQREYEGSDGIDTALSRLCNGAGRALFTSYG
VIFNSFYQLEPDYVDYYTNT MGKRSRVWHVGPVSLCNRRHVEGKSGRGRSASISEHLCLEWLNAKEPNSVIYVCFGSLTC
FSNEQLKEIATALERCEEYF IWVLKGGKDNEQEWLPQGFEERVEGKGLIIRGWAPQVLILDHEAIGGFVTHCGWNSTLES
ISAGVPMVTWPIYAEQFYNE KLVTDVLKVGVKVGSMKWSETTGATHLKHEEIEKALKQIMVGEEVLEMRKRASKLKEMAY
NAVEEGGSSYSHLTSLIDDL MASKAVLQKF* SEQ ID NO: 22 SoQA-TriFRXXylT (SoC28Xyl2) amino acid sequence The full- truncated feedback-insensitive form (tHMGR). The sequence for tHMGR is also given separately below. ATGGCTGTGGAGGTTCACCGCCGGGCTCCCGCGCCCCATGGCCGGGGCACCGGGGAGAAG
GGCCGCGTGCAGGCCGGGGA CGCGCTGCCGCTGCCGATCCGCCACACCAACCTCATCTTCTCGGCGCTCTTCGCCGCCTC
CCTCGCATACCTCATGCGCC GCTGGAGGGAGAAGATCCGCAACTCCACGCCGCTCCACGTCGTGGGGCTCACCGAGATCT
TCGCCATCTGCGGCCTCGTC GCCTCCCTCATCTACCTCCTCAGCTTCTTCGGCATCGCCTTCGTGCAGTCCGTCGTATCC
AACAGCGACGACGAGGACGA GGACTTCCTCATCGCGGCTGCAGCATCCCAGGCCCCCCCGCCGCCCTCCTCCAAGCCCGC
GCCGCAGCAGTGCGCCCTGC TGCAGAGCGCCGGAGTCGCGCCCGAGAAAATGCCCGAGGAGGACGAGGAAATCGTCGCCG
GGGTCGTCGCAGGGAAGATC CCCTCCTACGTGCTCGAGACCAGGCTAGGCGACTGCCGCAGGGCAGCCGGGATCCGCCGC
GAGGCGCTGCGCCGGATCAC CGGCAGGGAGATCGACGGCCTTCCCCTCGACGGCTTCGACTACGACTCGATTCTCGGACA
GTGCTGCGAGATGCCCGTCG GGTACGTGCAGCTGCCGGTCGGCGTCGCGGGGCCGCTCGTCCTCGACGGCCGCCGCATAT
ACGTCCCGATGGCCACCACG GAGGGCTGCCTAATCGCCAGCACCAACCGCGGATGCAAGGCCATTGCCGAGTCCGGAGGC
GCATCCAGCGTCGTGTACCG CGACGGGATGACCCGCGCCCCCGTAGCCCGCTTCCCCTCCGCACGACGCGCCGCAGAGCT
CAAGGGCTTCCTGGAGAATC CGGCCAACTACGACACCCTGTCCGTGGTCTTTAACAGATCAAGCAGATTTGCAAGGCTGC
AGGGGGTCAAGTGCGCCATG GCTGGGAGGAACTTGTACATGAGGTTCACCTGCAGCACCGGGGATGCCATGGGGATGAAC
ATGGTCTCCAAGGGCGTCCA AAATGTGCTCGACTATCTGCAGGAGGACTTCCCTGACATGGACGTTGTCAGCATCTCAGG
CAACTTTTGTTCCGACAAGA AATCAGCTGCTGTAAACTGGATTGAAGGCCGTGGAAAGTCCGTGGTTTGTGAGGCAGTAA
TCAGAGAGGAAGTTGTCCAC AAGGTTCTCAAGACCAACGTTCAGTCACTCGTGGAGTTGAATGTGATCAAGAACCTTGCT
GGCTCAGCAGTTGCTGGTGC TCTTGGGGGTTTCAACGCCCACGCAAGCAACATCGTAACGGCTATCTTCATTGCCACTGG
TCAGGATCCTGCACAGAATG TGGAGAGCTCACAGTGTATCACTATGTTGGAAGCTGTAAATGATGGCAGAGACCTTCACA
TCTCCGTTACAATGCCATCT ATCGAGGTGGGCACAGTTGGTGGAGGCACGCAGCTGGCCTCACAGTCGGCCTGCTTGGAC
CTACTGGGCGTCAAAGGCGC CAACAGGGAATCTCCGGGGTCGAACGCTAGGCTGCTGGCCACGGTGGTGGCTGGTGCCGT
CCTAGCTGGGGAGCTGTCCC TCATCTCCGCCCAAGCTGCCGGCCATCTGGTCCAGAGCCACATGAAATACAACAGATCCA
GCAAGGACATGTCCAAGATC GCCTGCTGA SEQ ID NO: 23 - AsHMGR (Avena strigosa HMG-CoA reductase) coding sequence (1689bp): MAVEVHRRAPAPHGRGTGEKGRVQAGDALPLPIRHTNLIFSALFAASLAYLMRRWREKIR
NSTPLHVVGLTEIFAICGLV ASLIYLLSFFGIAFVQSVVSNSDDEDEDFLIAAAASQAPPPPSSKPAPQQCALLQSAGVA
PEKMPEEDEEIVAGVVAGKI PSYVLETRLGDCRRAAGIRREALRRITGREIDGLPLDGFDYDSILGQCCEMPVGYVQLPV
GVAGPLVLDGRRIYVPMATT EGCLIASTNRGCKAIAESGGASSVVYRDGMTRAPVARFPSARRAAELKGFLENPANYDTL
SVVFNRSSRFARLQGVKCAM AGRNLYMRFTCSTGDAMGMNMVSKGVQNVLDYLQEDFPDMDVVSISGNFCSDKKSAAVNW
IEGRGKSVVCEAVIREEVVH KVLKTNVQSLVELNVIKNLAGSAVAGALGGFNAHASNIVTAIFIATGQDPAQNVESSQCI
TMLEAVNDGRDLHISVTMPS IEVGTVGGGTQLASQSACLDLLGVKGANRESPGSNARLLATVVAGAVLAGELSLISAQAA
GHLVQSHMKYNRSSKDMSKI AC* SEQ ID NO: 24 - AsHMGR (Avena strigosa HMG-CoA reductase) translated nucleotide sequence (562aa) ATGGCGCCCGAGAAAATGCCCGAGGAGGACGAGGAAATCGTCGCCGGGGTCGTCGCAGGG
AAGATCCCCTCCTACGTGCT CGAGACCAGGCTAGGCGACTGCCGCAGGGCAGCCGGGATCCGCCGCGAGGCGCTGCGCCG
GATCACCGGCAGGGAGATCG ACGGCCTTCCCCTCGACGGCTTCGACTACGACTCGATTCTCGGACAGTGCTGCGAGATGC
CCGTCGGGTACGTGCAGCTG CCGGTCGGCGTCGCGGGGCCGCTCGTCCTCGACGGCCGCCGCATATACGTCCCGATGGCC
ACCACGGAGGGCTGCCTAAT CGCCAGCACCAACCGCGGATGCAAGGCCATTGCCGAGTCCGGAGGCGCATCCAGCGTCGT
GTACCGCGACGGGATGACCC GCGCCCCCGTAGCCCGCTTCCCCTCCGCACGACGCGCCGCAGAGCTCAAGGGCTTCCTGG
AGAATCCGGCCAACTACGAC ACCCTGTCCGTGGTCTTTAACAGATCAAGCAGATTTGCAAGGCTGCAGGGGGTCAAGTGC
GCCATGGCTGGGAGGAACTT GTACATGAGGTTCACCTGCAGCACCGGGGATGCCATGGGGATGAACATGGTCTCCAAGGG
CGTCCAAAATGTGCTCGACT ATCTGCAGGAGGACTTCCCTGACATGGACGTTGTCAGCATCTCAGGCAACTTTTGTTCCG
ACAAGAAATCAGCTGCTGTA AACTGGATTGAAGGCCGTGGAAAGTCCGTGGTTTGTGAGGCAGTAATCAGAGAGGAAGTT
GTCCACAAGGTTCTCAAGAC CAACGTTCAGTCACTCGTGGAGTTGAATGTGATCAAGAACCTTGCTGGCTCAGCAGTTGC
TGGTGCTCTTGGGGGTTTCA ACGCCCACGCAAGCAACATCGTAACGGCTATCTTCATTGCCACTGGTCAGGATCCTGCAC
AGAATGTGGAGAGCTCACAG TGTATCACTATGTTGGAAGCTGTAAATGATGGCAGAGACCTTCACATCTCCGTTACAATG
CCATCTATCGAGGTGGGCAC AGTTGGTGGAGGCACGCAGCTGGCCTCACAGTCGGCCTGCTTGGACCTACTGGGCGTCAA
AGGCGCCAACAGGGAATCTC CGGGGTCGAACGCTAGGCTGCTGGCCACGGTGGTGGCTGGTGCCGTCCTAGCTGGGGAGC
TGTCCCTCATCTCCGCCCAA GCTGCCGGCCATCTGGTCCAGAGCCACATGAAATACAACAGATCCAGCAAGGACATGTCC
AAGATCGCCTGCTGA SEQ ID NO: 25 - AstHMGR (Avena strigosa truncated HMG-CoA reductase) coding sequence (1275bp): MAPEKMPEEDEEIVAGVVAGKIPSYVLETRLGDCRRAAGIRREALRRITGREIDGLPLDG
FDYDSILGQCCEMPVGYVQL PVGVAGPLVLDGRRIYVPMATTEGCLIASTNRGCKAIAESGGASSVVYRDGMTRAPVARF
PSARRAAELKGFLENPANYD TLSVVFNRSSRFARLQGVKCAMAGRNLYMRFTCSTGDAMGMNMVSKGVQNVLDYLQEDFP
DMDVVSISGNFCSDKKSAAV NWIEGRGKSVVCEAVIREEVVHKVLKTNVQSLVELNVIKNLAGSAVAGALGGFNAHASNI
VTAIFIATGQDPAQNVESSQ CITMLEAVNDGRDLHISVTMPSIEVGTVGGGTQLASQSACLDLLGVKGANRESPGSNARL
LATVVAGAVLAGELSLISAQ AAGHLVQSHMKYNRSSKDMSKIAC* SEQ ID NO: 26 - AstHMGR (Avena strigosa truncated HMG-CoA reductase) translated nucleotide sequence (424aa): ATGGGGGCGCTGTCGCGGCCGGAGGAGGTGGTGGCGCTGGTCAAGCTGAGGGTGGCGGCG
GGGCAGATCAAGCGCCAGAT CCCGGCCGAGGAACACTGGGCCTTCGCCTACGACATGCTCCAGAAGGTCTCCCGCAGCTT
CGCGCTCGTCATCCAGCAGC TCGGACCCGAACTCCGCAATGCCGTGTGCATCTTCTACCTCGTGCTCCGGGCCCTGGACA
CCGTCGAGGACGACACCAGC ATCCCCAACGACGTGAAGCTGCCCATCCTTCGGGATTTCTACCGCCATGTCTACAACCCC
GACTGGCGTTATTCATGTGG AACAAACCACTACAAGGTGCTGATGGATAAGTTCAGACTCGTCTCCACGGCTTTCCTGGA
GCTAGGCGAAGGATATCAAA AGGCAATTGAAGAAATCACTAGGCGAATGGGAGCAGGAATGGCAAAATTTATATGCCAGG
AGGTTGAAACGATTGATGAC TATAATGAGTACTGCCACTATGTAGCAGGGCTAGTAGGCTATGGACTTTCCAGGCTCTTT
CATGCTGCTGGGACAGAAGA TCTGGCTTCAGATCAACTTTCGAATTCAATGGGTTTGTTTCTTCAGAAAACCAATATAAT
AAGGGATTATTTGGAGGATA TAAATGAGATACCAAAGTGCCGTATGTTTTGGCCTCGAGAAATATGGAGTAAATATGCAG
ATAAACTTGAGGACCTCAAG TATGAGGAAAATTCAGAAAAAGCAGTGCAATGCTTGAATGATATGGTGACTAATGCTTTG
GTCCACGCCGAAGACTGTCT TCAATACATGTCTGCGTTGAAGGATAATACTAATTTTCGGTTTTGTGCAATACCTCAGAT
AATGGCAATTGGGACATGTG CTATTTGCTACAATAATGTGAAAGTCTTTAGAGGAGTTGTTAAGATGAGGCGTGGGCTCA
CTGCACGAATAATTGATGAG ACAAAATCAATGTCAGATGTCTATTCTGCTTTCTATGAGTTCTCTTCATTGCTAGAGTCA
AAGATTGACGATAACGACCC AAGTTCTGCACTAACACGGAAGCGTGTAGAGGCAATAAAGAGGACTTGCAAGTCATCCGG
TTTACTAAAGAGAAGGGGAT ACGACCTGGAAAAGTCAAAGTATAGGCATATGTTGATCATGCTTGCACTTCTGTTGGTGG
CTATTATCTTCGGTGTACTG TACGCCAAGTGA SEQ ID NO: 27 - AsSQS (Avena strigosa squalene synthase) coding sequence (1212bp): MGALSRPEEVVALVKLRVAAGQIKRQIPAEEHWAFAYDMLQKVSRSFALVIQQLGPELRN
AVCIFYLVLRALDTVEDDTS IPNDVKLPILRDFYRHVYNPDWRYSCGTNHYKVLMDKFRLVSTAFLELGEGYQKAIEEIT
RRMGAGMAKFICQEVETIDD YNEYCHYVAGLVGYGLSRLFHAAGTEDLASDQLSNSMGLFLQKTNIIRDYLEDINEIPKC
RMFWPREIWSKYADKLEDLK YEENSEKAVQCLNDMVTNALVHAEDCLQYMSALKDNTNFRFCAIPQIMAIGTCAICYNNV
KVFRGVVKMRRGLTARIIDE TKSMSDVYSAFYEFSSLLESKIDDNDPSSALTRKRVEAIKRTCKSSGLLKRRGYDLEKSK
YRHMLIMLALLLVAIIFGVL YAK* SEQ ID NO: 28 - AsSQS (Avena strigosa squalene synthase) translated nucleotide sequence (403aa): ATGAAAAACATGATGAATTATAAATTAAAACTCTGTTCTGTCTCAAAAAACTCAAAAGGA
GTCTCTCTCTCACCTACACC ACACCTAACCAAACCCCCTACGATTCACACAGAGAGAGATCTTCTTCTTCCTTCTTCTTC
CTTCTTCTTTCTTCTTCTTT CTTCTTCTAGCTACAACATCTACAACGCCATGTCCTCTTCTTCTTCTTCGTCAACCTCCA
TGATCGATCTCATGGCAGCA ATCATCAAAGGAGAGCCTGTAATTGTCTCCGACCCAGCTAATGCCTCCGCTTACGAGTCC
GTAGCTGCTGAATTATCCTC TATGCTTATAGAGAATCGTCAATTCGCCATGATTGTTACCACTTCCATTGCTGTTCTTAT
TGGTTGCATCGTTATGCTCG TTTGGAGGAGATCCGGTTCTGGGAATTCAAAACGTGTCGAGCCTCTTAAGCCTTTGGTTA
TTAAGCCTCGTGAGGAAGAG ATTGATGATGGGCGTAAGAAAGTTACCATCTTTTTCGGTACACAAACTGGTACTGCTGAA
GGTTTTGCAAAGGCTTTAGG AGAAGAAGCTAAAGCAAGATATGAAAAGACCAGATTCAAAATCGTTGATTTGGATGATTA
CGCGGCTGATGATGATGAGT ATGAGGAGAAATTGAAGAAAGAGGATGTGGCTTTCTTCTTCTTAGCCACATATGGAGATG
GTGAGCCTACCGACAATGCA GCGAGATTCTACAAATGGTTCACCGAGGGGAATGACAGAGGAGAATGGCTTAAGAACTTG
AAGTATGGAGTGTTTGGATT AGGAAACAGACAATATGAGCATTTTAATAAGGTTGCCAAAGTTGTAGATGACATTCTTGT
CGAACAAGGTGCACAGCGTC TTGTACAAGTTGGTCTTGGAGATGATGACCAGTGTATTGAAGATGACTTTACCGCTTGGC
GAGAAGCATTGTGGCCCGAG CTTGATACAATACTGAGGGAAGAAGGGGATACAGCTGTTGCCACACCATACACTGCAGCT
GTGTTAGAATACAGAGTTTC TATTCACGACTCTGAAGATGCCAAATTCAATGATATAAACATGGCAAATGGGAATGGTTA
CACTGTGTTTGATGCTCAAC ATCCTTACAAAGCAAATGTCGCTGTTAAAAGGGAGCTTCATACTCCCGAGTCTGATCGTT
CTTGTATCCATTTGGAATTT GACATTGCTGGAAGTGGACTTACGTATGAAACTGGAGATCATGTTGGTGTACTTTGTGAT
AACTTAAGTGAAACTGTAGA TGAAGCTCTTAGATTGCTGGATATGTCACCTGATACTTATTTCTCACTTCACGCTGAAAA
AGAAGACGGCACACCAATCA GCAGCTCACTGCCTCCTCCCTTCCCACCTTGCAACTTGAGAACAGCGCTTACACGATATG
CATGTCTTTTGAGTTCTCCA AAGAAGTCTGCTTTAGTTGCGTTGGCTGCTCATGCATCTGATCCTACCGAAGCAGAACGA
TTAAAACACCTTGCTTCACC TGCTGGAAAGGATGAATATTCAAAGTGGGTAGTAGAGAGTCAAAGAAGTCTACTTGAGGT
GATGGCCGAGTTTCCTTCAG CCAAGCCACCACTTGGTGTCTTCTTCGCTGGAGTTGCTCCAAGGTTGCAGCCTAGGTTCT
ATTCGATATCATCATCGCCC AAGATTGCTGAAACTAGAATTCACGTCACATGTGCACTGGTTTATGAGAAAATGCCAACT
GGCAGGATTCATAAGGGAGT GTGTTCCACTTGGATGAAGAATGCTGTGCCTTACGAGAAGAGTGAAAACTGTTCCTCGGC
GCCGATATTTGTTAGGCAAT CCAACTTCAAGCTTCCTTCTGATTCTAAGGTACCGATCATCATGATCGGTCCAGGGACTG
GATTAGCTCCATTCAGAGGA TTCCTTCAGGAAAGACTAGCGTTGGTAGAATCTGGTGTTGAACTTGGGCCATCAGTTTTG
TTCTTTGGATGCAGAAACCG TAGAATGGATTTCATCTACGAGGAAGAGCTCCAGCGATTTGTTGAGAGTGGTGCTCTCGC
AGAGCTAAGTGTCGCCTTCT CTCGTGAAGGACCCACCAAAGAATACGTACAGCACAAGATGATGGACAAGGCTTCTGATA
TCTGGAATATGATCTCTCAA GGAGCTTATTTATATGTTTGTGGTGACGCCAAAGGCATGGCAAGAGATGTTCACAGATCT
CTCCACACAATAGCTCAAGA ACAGGGGTCAATGGATTCAACTAAAGCAGAGGGCTTCGTGAAGAATCTGCAAACGAGTGG
AAGATATCTTAGAGATGTAT GGTAA SEQ ID NO: 29 - AtATR2 (Arabidopsis thaliana cytochrome P450 reductase 2) coding sequence (2325bp): MKNMMNYKLKLCSVSKNSKGVSLSPTPHLTKPPTIHTERDLLLPSSSFFFLLLSSSSYNI
YNAMSSSSSSSTSMIDLMAA IIKGEPVIVSDPANASAYESVAAELSSMLIENRQFAMIVTTSIAVLIGCIVMLVWRRSGS
GNSKRVEPLKPLVIKPREEE IDDGRKKVTIFFGTQTGTAEGFAKALGEEAKARYEKTRFKIVDLDDYAADDDEYEEKLKK
EDVAFFFLATYGDGEPTDNA ARFYKWFTEGNDRGEWLKNLKYGVFGLGNRQYEHFNKVAKVVDDILVEQGAQRLVQVGLG
DDDQCIEDDFTAWREALWPE LDTILREEGDTAVATPYTAAVLEYRVSIHDSEDAKFNDINMANGNGYTVFDAQHPYKANV
AVKRELHTPESDRSCIHLEF DIAGSGLTYETGDHVGVLCDNLSETVDEALRLLDMSPDTYFSLHAEKEDGTPISSSLPPP
FPPCNLRTALTRYACLLSSP KKSALVALAAHASDPTEAERLKHLASPAGKDEYSKWVVESQRSLLEVMAEFPSAKPPLGV
FFAGVAPRLQPRFYSISSSP KIAETRIHVTCALVYEKMPTGRIHKGVCSTWMKNAVPYEKSENCSSAPIFVRQSNFKLPS
DSKVPIIMIGPGTGLAPFRG FLQERLALVESGVELGPSVLFFGCRNRRMDFIYEEELQRFVESGALAELSVAFSREGPTK
EYVQHKMMDKASDIWNMISQ GAYLYVCGDAKGMARDVHRSLHTIAQEQGSMDSTKAEGFVKNLQTSGRYLRDVW* SEQ ID NO: 30 - AtATR2 (Arabidopsis thaliana cytochrome P450 reductase 2) translated nucleotide sequence (774aa): ATGGCTGAAGCATCCTCATTTCTTGCACAGAAAAGGTATGCGGTCGTGACAGGAGCAAAC
AAAGGACTAGGACTAGAAAT ATGCGGACAGCTTGCTTCACAGGGGGTGACGGTACTGCTGACATCCAGAGATGAAAAACG
AGGCTTAGAAGCCATTGAGG AGCTTAAGAAATCGGGGATTAATTCGGAAAATCTTGAATATCATCAGCTGGATGTTACTA
AGCCAGCTAGTTTCGCTTCT CTGGCCGATTTCATCAAGGCCAAATTTGGCAAGCTTGATATCCTGGTGAACAATGCAGGG
ATCAGCGGTGTTATTGTAGA TTATGCAGCTTTAATGGAAGCCATTCGCCGTCGAGGGGCAGAGATCAATTACGATGGAGT
GATGAAACAGACCTACGAGC TAGCAGAGGAATGCTTGCAAACAAATTACTATGGTGTGAAAAGAACCATTAATGCTCTCC
TTCCGCTACTTCAGTTTTCC GATTCACCAAGGATCGTCAATGTTTCCTCCGATGTTGGCCTCCTTAAGAAAATACCCGGC
GAGAGAATCAGAGAAGCCTT AGGCGACGTGGAAAAACTTACGGAAGAAAGCGTGGACGGGATTTTAGACGAGTTTCTAAG
AGATTTCAAGGAAGGCAAGA TCGCAGAGAAAGGTTGGCCTACGTTTAAGAGCGCCTATTCAATCTCAAAGGCGGCGCTCA
ATTCGTACACGAGGGTTTTA GCACGGAAATACCCGTCGATCATCATCAACTGTGTCTGCCCGGGTGTCGTCAAAACCGAT
ATCAATCTTAAAATGGGCCA CTTGACGGTTGAAGAAGGCGCGGCCAGTCCCGTGAGGTTAGCACTCATGCCCCTTGGTTC
GCCTTCCGGCCTGTTCTATA CTCGAAACGAAGTAACTCCATTTGAATGA SEQ ID NO: 31 SoFuSyn coding sequence MAEASSFLAQKRYAVVTGANKGLGLEICGQLASQGVTVLLTSRDEKRGLEAIEELKKSGI
NSENLEYHQLDVTKPASFAS LADFIKAKFGKLDILVNNAGISGVIVDYAALMEAIRRRGAEINYDGVMKQTYELAEECLQ
TNYYGVKRTINALLPLLQFS DSPRIVNVSSDVGLLKKIPGERIREALGDVEKLTEESVDGILDEFLRDFKEGKIAEKGWP
TFKSAYSISKAALNSYTRVL ARKYPSIIINCVCPGVVKTDINLKMGHLTVEEGAASPVRLALMPLGSPSGLFYTRNEVTP
FE* SEQ ID NO: 32 SoFuSyn translated nucleotide sequence ATGGTTCTTAGTCGATTGGATTTTCCGTCCGATTTCATTTTTGGCTCCGGCACGTCAGCT
TCTCAGGTAGAAGGTGCAGC ACTAGAGGATGGGAAGACTTCGACTGCATTTGAAGGATTCTTAACTCGCATGAGTGGAAA
TGATTTGAGCAAAGGAGTTG AAGGCTACTACAAATACAAGGAAGACGTCCAGTTAATGGTGCAAACAGGACTAGATGCAT
ACAGATTCTCCATTTCATGG TCAAGACTAATTCCCGGTGGAAAAGGACCCGTCAACCCAAAAGGTTTACAATATTATAAT
AACTTTATCGACGAACTCAT CAAAAATGGAATACAACCGCACGTTACTCTGCTGCATTTCGACATACCGGACACACTTAT
GACTGCTTATAATGGATTGA AGGGTCAAGAATTTGTGGAAGATTTCACGGCATTTGCTGACGTGTGCTTCAAGGAATTTG
GTGACCGAGTTTTGTATTGG ACGACGGTCAATGAAGCAAATAATTTTGCAAGTCTAACACTCGATGAGGGCAATTTTATG
CCGTCTACTGAACCGTACAT TAGAGGTCACAATATCATTCTTGCTCATGCATCCGCGGTAAAACTATACCGAGAAAAATA
TAAGAAAACCCAAAATGGAT TCATAGGCTTGAATTTATATGCAAGCTGGTATTTTCCCGAGACCGATGACGAACAAGATT
CAATTGCCGCTCAAAGAGCC ATTGATTTTACTATTGGATGGATAATGCAACCATTGATATACGGAGAATATCCAGAAACA
TTGAAGAAACAAGTGGGAGA AAGACTGCCAACATTTACAAAAGAAGAGTCAACGTTCGTTAAAAATTCGTTTGACTTCAT
TGGAGTGAATTGCTACGTCG GCACTGCTGTTAAGGATGACCCTGACAGCTGTAACAGTAAAAATAAAACTATTATTACTG
ACATGTCTGCTAAACTTTCT CCTAAAGGTGAACTAGGAGGAGCGTATATGAAGGGATTGTTGGAATACTTCAAAAGAGAT
TACGGCAATCCGCCAATTTA CATTCAAGAAAATGGTTATTGGACACCGCGTGAATTAGGAGTGAACGATGCGTCAAGGAT
CGAATACCATACTGCTTCTC TTGCTAGCATGCACGATGCTATGAAGAATGGGGCAAATGTAAAGGGATATTTCCAATGGT
CATTTTTGGATCTCTTGGAG GTGTTCAAATACAGCTATGGCCTCTACCATGTCGATTTGGAAGACCCGACCCGAGAAAGA
CGACCCAAGGCATCCGCCAA TTGGTACGCGGAGTTCTTGAAGGGTTGCGCTACTTCTAACGGGAATGCTAAAGTTGAAAC
TCCGTTGTAA SEQ ID NO: 33 SoGH1 coding sequence MVLSRLDFPSDFIFGSGTSASQVEGAALEDGKTSTAFEGFLTRMSGNDLSKGVEGYYKYK
EDVQLMVQTGLDAYRFSISW SRLIPGGKGPVNPKGLQYYNNFIDELIKNGIQPHVTLLHFDIPDTLMTAYNGLKGQEFVE
DFTAFADVCFKEFGDRVLYW TTVNEANNFASLTLDEGNFMPSTEPYIRGHNIILAHASAVKLYREKYKKTQNGFIGLNLY
ASWYFPETDDEQDSIAAQRA IDFTIGWIMQPLIYGEYPETLKKQVGERLPTFTKEESTFVKNSFDFIGVNCYVGTAVKDD
PDSCNSKNKTIITDMSAKLS PKGELGGAYMKGLLEYFKRDYGNPPIYIQENGYWTPRELGVNDASRIEYHTASLASMHDA
MKNGA SEQ ID NO: 34 SoGH1 translated nucleotide sequence ATGGAACCTTCAAAAATGGAAGTGAAAATAATATCGTCCGAAACCATCAAACCGTCATCT
CCGACACCATCCCACCTTCG AAAATATACACTTTCTTTGCTCGACCAAAAATACACGCCTATCGTTGTTCCGGCCATTCT
ATTCTATGAGCGCCCACAAG GGGTGGCGCCATTGGATATGGACCGTCTCAGAACATGCCTCTCACAGACACTTACCGCGT
TTTACCCTTTAGCCGGACGA GCTGAATCTCGAGACGTTATAATATGTAATGACGAAGGTATCCCCTTCGTTGAGGCTCAT
GTCGATTGTGAACTTTCGAG TGTTGTTAAGTCGCTTTCGTCCCTAGGGAGTGATTTGCGGTCTTTTTACCCGCCTAGGGA
CGGTTTACTCGAGGGGGGAA TTCAGTTTGCTATTCAGATGAATGTGTTTAGTTGTGGCGGGTTTGCGTTCGCGTGGTATT
GCACGCATAACGTTACTGAC GGGACCTCGACTGCTAACTTTTTTAGGTATTGGACTGCGCTGTATGCTCAACGTAGTGAG
TACGCAGTCCAAGACCTAAT GGATTTCAATTCCGTCGTCACTGCCTTTCCCCCTGTGCCGCCCCGTGTACCGCAGGAGGA
AAAACCGGTGACAACGGAAT TGAAACCCGAGAAACAAGAGGGACAAGAAAAGGAGGAAAAGAAAAAATCGTCATTTAATT
TCAGTTTTCAATCTCACATC GTGGCGAGGAGTTTCTTGATAAAGAGCAAGGCGGTCGCAGAGTTGAAGGCCAAGTCGGTA
AGCGAGGAAGTGCCATATCC GAGTCGGTTCGAGGCCGTGTCGGCTTTCCTATGGAAATCGATAGTGTCAAGCTCGACAAC
AGAAGGGAAGACGATGATCA ATATGCCCGTAAACTTGAGACCACGGGTGGACCCGCCATTACCCTTGGACTCCGTAGGTA
ACATTTTCGAAAATGCACTC GTACAGTCCGAGAAAAAAGCGGAGCTCCACGAATTCGTTGCAAGGATCCGTGGATCAATC
TCGAAAATGAAAGATTTTGC CACGGAATATCAAGGCGAAAAGCGGGAAGAAGCTAAGGACGCACATTGGAAAAGATTCAT
AAAAGCGGTTATCGAGTGTA AGGGGAAAGACGCCTACGTAATTTCGCCTTGGTATAAGTCGTCCGGGTTTACGGACATAG
ATTTCGGGTTTGGGACCCCG ATACGGGTCGTACCCATGGACGATGTCGTAAATCATAATCAAAGGAACACGATAATGTTG
ATGGAGTTTGTTGATTCCGA CGGTGATGGATTTGAAGCTTGGATGTTCCTGGAGGAGGAATGTATCAAGTTTTTGGAGTC
CAACCCGGAATTTCTTGCCT TTGCTTCCCCAAACTTTTAA SEQ ID NO: 35 SoBAHD1 coding sequence MEPSKMEVKIISSETIKPSSPTPSHLRKYTLSLLDQKYTPIVVPAILFYERPQGVAPLDM
DRLRTCLSQTLTAFYPLAGR AESRDVIICNDEGIPFVEAHVDCELSSVVKSLSSLGSDLRSFYPPRDGLLEGGIQFAIQM
NVFSCGGFAFAWYCTHNVTD GTSTANFFRYWTALYAQRSEYAVQDLMDFNSVVTAFPPVPPRVPQEEKPVTTELKPEKQE
GQEKEEKKKSSFNFSFQSHI VARSFLIKSKAVAELKAKSVSEEVPYPSRFEAVSAFLWKSIVSSSTTEGKTMINMPVNLR
PRVDPPLPLDSVGNIFENAL VQSEKKAELHEFVARIRGSISKMKDFATEYQGEKREEAKDAHWKRFIKAVIECKGKDAYV
ISPWYKSSGFTDIDFGFGTP IRVVPMDDVVNHNQRNTIMLMEFVDSDGDGFEAWMFLEEECIKFLESNPEFLAFASPNF SEQ ID NO: 36 SoBAHD1 translated nucleotide sequence
Tables Name Accession/GenBank ID Species AtBAS At1g78950 Arabidopsis thaliana AaBAS EU330197 Artemisia annua AsOXA1 AY836006 Aster sedifolius AsbAS1 AJ311789 Avena strigosa MtbAS1 AJ430607 Medicago truncatula PgOSCPNY1 AB009030 Panax ginseng PsOSCPSY AB034802 Pisum sativum SlTTS1 HQ266579 Solanum lycopersicum VhBS DQ915167 Vaccaria hispanica AtCAS1 At2g07050 Arabidopsis thaliana AsCS1 AJ311790 Avena strigosa LjOSC5 AB181246 Lotus japonicus PgOSCPNX1 AB009029 Panax ginseng PsCASPEA D89619 Pisum sativum LjOSC7 AB244671 Lotus japonicus GgLUS1 AB116228 Glycyrrhiza glabra KdLUS HM623871 Kalanchoe daigremontiana LjOSC3 AB181245 Lotus japonicus Table 1. List of literature oxidosqualene cyclase sequences used in phylogenetic analyses.
Primer Name Sequence (5 3 ) bAS FWD-SobAS-attB GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGTGGAGGTTAAAAATAGCAGAAG REV-SobAS-attB GGGGACCACTTTGTACAAGAAAGCTGGGTATTAACTAAGCCTCAAAGGAACATG CYP450 FWD-SoC28-attb GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGGAACTCTTCTTCATATGTGGA REV-SoC28-attb GGGGACCACTTTGTACAAGAAAGCTGGGTATTAAGCAGATACAGTTACGGGTTT FWD-SoC2816-attb GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGGAGCTAATTACCTTACTAAGTG REV-SoC2816-attb GGGGACCACTTTGTACAAGAAAGCTGGGTATTAAGCGAGGGTGGCGGATT FWD-SoC23-attb GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGCATAGTAATAGTAAGATGGGTA REV-SoC23-attb GGGGACCACTTTGTACAAGAAAGCTGGGTATTAACGTCTACGAAACATGAGAG CSL FWD-SoCSL GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGTCACCCCACAACACCTG REV-SoCSL GGGGACCACTTTGTACAAGAAAGCTGGGTATTAAGAGCGACCTTTTCTAGCTTT UGT FWD-SoC3Gal GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGGGTTCAAATACAGAAGCAACT REV-SoC3Gal GGGGACCACTTTGTACAAGAAAGCTGGGTATCAAGCCTTCCTTAACGATCTC FWD-SoC3Xyl GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGAAGTCACCACTAAAGTTGTAC REV-SoC3Xyl GGGGACCACTTTGTACAAGAAAGCTGGGTACTAATTAGCAACCTTACTCATTTTATC FWD-SoC3Fu GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGTCGGATCAAAATGATAAAAAGGT REV-SoC3Fu GGGGACCACTTTGTACAAGAAAGCTGGGTATTAGAAAGATGAAACCCACTCAATAA FWD-SoC3Rha GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGTCTGCCAAAATGTTGCACG REV-SoC3Rha GGGGACCACTTTGTACAAGAAAGCTGGGTATCACTCGACGAGTGCTTGTAAA FWD-SoC3Xyl1 GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGGGTACTAAAGAGTTACACATAG REV-SoC3Xyl1 GGGGACCACTTTGTACAAGAAAGCTGGGTACTACTTCTCAACAAGATCTTGTAG FWD-SoC3Xyl2 GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGGAGGAATCAAAGGAGGAAG REV-SoC3Xyl2 GGGGACCACTTTGTACAAGAAAGCTGGGTATCAAAATTTTTGTAGCACAGCTTTG FWD-SoBAHD1 GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGGAAGTGAAAATTGTACGTAGG REV-SoBAHD1 GGGGACCACTTTGTACAAGAAAGCTGGGTATTAGCTGGGCGTGGCATATTC Sequencing FWD-attL1 TCGCGTTAACGCTAGCATGGATCTC REV-attL2 ACATCAGAGATTTTGAGACACGGGC Table 2. Primer oligonucleotide sequences. Contig Human Readable Description PCC TRINITY_DN1084_c0_g4 (SobAS) Terpene cyclase/mutase family member 1.00 TRINITY_DN645_c1_g2 (SoC23) Cytochrome P450 0.99 TRINITY_DN651_c0_g3 (SoC28) Cytochrome P450 0.97 TRINITY_DN5729_c1_g1 Cytochrome P450 0.97 TRINITY_DN2993_c0_g1 Cytochrome P450 0.95 TRINITY_DN13626_c1_g2 (SoC28C16) Cytochrome P450 0.95 TRINITY_DN58802_c0_g3 Cytochrome P450 family protein 0.93 TRINITY_DN5664_c0_g3 Cytochrome P450 0.92 TRINITY_DN283414_c0_g1 Cytochrome p450 0.92 TRINITY_DN8790_c0_g3 Cytochrome P450 0.91 TRINITY_DN5664_c0_g1 Cytochrome P450 0.90 TRINITY_DN44858_c0_g1 Cytochrome P450, putative 0.89 TRINITY_DN10048_c0_g1 Cytochrome P450, putative 0.88 TRINITY_DN55859_c0_g1 Cytochrome P450, putative 0.88 TRINITY_DN5555_c0_g1 Cytochrome P450, putative 0.87 TRINITY_DN41487_c0_g1 Cytochrome P450 0.86 TRINITY_DN183736_c0_g1 Cytochrome P450 0.86 TRINITY_DN8560_c0_g1 Cytochrome P450, putative 0.86 TRINITY_DN135458_c0_g1 Cytochrome P450, putative 0.85 TRINITY_DN2210_c0_g1 Cytochrome P450 0.84 TRINITY_DN101327_c0_g6 Cytochrome p450 0.82 TRINITY_DN7831_c0_g3 Cytochrome P450 0.81 TRINITY_DN43050_c0_g1 Cytochrome P450 0.81 TRINITY_DN71147_c0_g2 Cytochrome P4504g15 0.80 TRINITY_DN78115_c0_g1 Cytochrome P450 0.80 TRINITY_DN4811_c1_g2 Cytochrome P450 0.80 Table 3. Correlation analysis of candidate CYP450s and characterized SobAS gene expression pattern using Pearson correlation coefficient (PCC).
Contig Human Readable Description PCC TRINITY_DN1084_c0_g4 (SobAS) Terpene cyclase/mutase family member 1.00 TRINITY_DN1618_c1_g2 Glycosyltransferase 0.99 TRINITY_DN28657_c0_g1 (SoC28Xyl2) Glycosyltransferase 0.98 TRINITY_DN5570_c0_g3 Glycosyltransferase 0.98 TRINITY_DN5701_c1_g1 (SoC28Rha) Glycosyltransferase 0.98 TRINITY_DN3554_c0_g2 O-fucosyltransferase 0.97 TRINITY_DN54808_c0_g7 Glycosyltransferase 0.97 TRINITY_DN5570_c0_g1 Glycosyltransferase 0.96 TRINITY_DN51550_c0_g1 (SoC3Gal) Glycosyltransferase 0.96 TRINITY_DN347728_c0_g1 Glycosyltransferase 0.96 TRINITY_DN41181_c0_g1 Glycosyltransferase 0.95 TRINITY_DN342_c0_g1 (SoC28Fu) Glycosyltransferase 0.95 TRINITY_DN5422_c7_g1 UDP-glycosyltransferase 0.95 TRINITY_DN14107_c4_g1 (SoC3Xyl) Glycosyltransferase 0.94 TRINITY_DN31287_c0_g2 Glycosyltransferase 0.91 TRINITY_DN15200_c0_g1 Unknown protein 0.91 TRINITY_DN586_c1_g1 (SoC28Xyl1) Glycosyltransferase 0.91 Table 4. Correlation analysis of candidate UGTs and characterized SobAS gene expression pattern using Pearson correlation coefficient (PCC). Contig Human Readable Description PCC TRINITY_DN1084_c0_g4 Terpene cyclase/mutase family member 1.00 TRINITY_DN345366_c0_g1 Cellulose synthase 0.97 TRINITY_DN23622_c0_g2 (SoCSL) Cellulose synthase 0.91 TRINITY_DN46549_c0_g1 Cellulose synthase 0.90 TRINITY_DN11658_c0_g2 Cellulose synthase 0.89 TRINITY_DN57970_c0_g1 Cellulose synthase 0.88 TRINITY_DN86505_c0_g1 Cellulose synthase 0.86 TRINITY_DN19883_c0_g5 Cellulose synthase 0.85 Table 5. Correlation analysis of candidate CSLs and characterized SobAS gene expression pattern using Pearson correlation coefficient (PCC). S. officinalis Q. saponaria AA Identity (%) SobAS QsbAS 79.7% SoC28 QsCYP716-C28 74.8% SoC28C16 QsCYP716-C16 (short) 49.0% SoC23 QsCYP714-C-23 33.0% SoCSL QsCslG2 56.0% SoC3Gal Qs-3-O-GalT 46.3% SoC3Xyl Qs-3-O-XylT (Qs_0283870) 47.2% SoC28Fu Qs-28-O-FucT 43.0% SoFuSyn QsFucSyn 57.2% SoC28Rha Qs-28-O-RhaT 29.2% SoC28Xyl1 Qs-28-O-XylT3 31.1% SoC28Xyl2 Qs-28-O-XylT4 41.2% Table 6. Amino acid sequence similarity between genes involved in saponarioside biosynthesis in S. officinalis and QS-21 biosynthetic genes in Q. saponaria.
References [1] Jia, Z., Koike, K. and Nikaido, T. (1998). Major triterpenoid saponins from Saponaria officinalis. Journal of Natural Products.61: 1368-1373. [2] Eastman, J. (2014). Wildflowers of the Eastern United States: An Introduction to Common Species of Woods, Wetlands and Fields. Stackpole Books. [3] Rees, A. (1819). The cyclopædia; or, universal dictionary of arts, sciences, and literature (Vol. 4). Longman, Hurst, Rees, Orme and Brown. [4] Korkmaz, M. and Özçelik, H. (2011). Economic importance of Gypsophila L., Ankyropetalum fenzl and Saponaria L.(Caryophyllaceae) taxa of Turkey. African journal of Biotechnology, 10(47), 9533-9541. [5] Böttger, S. and Melzig, M. F. (2011). Triterpenoid saponins of the Caryophyllaceae and Illecebraceae family. Phytochemistry Letters.4: 59-68. [6] - E. (2017). Saponaria officinalis L. extract: Surface active properties and impact on environmental bacterial strains. Colloids and Surfaces B: Biointerfaces, 150, 209-215. [7] Gonzalez, P. J. and Sörensen, P. M. (2020). Characterization of saponin foam from Saponaria officinalis for food applications. Food Hydrocolloids, 101, 105541. [8] -Szakiel, M., Paszkiewicz, M., Stochmal, A., Moniuszko- Szajwaj, B., Kowalczyk, M. and (2014). New pharmacological properties of Medicago sativa and Saponaria officinalis saponin-rich fractions addressed to Candida albicans. Journal of medical microbiology, 63(8), 1076-1086. [9] Gilabert-Oriol, R., Thakur, M., Haussmann, K., Niesler, N., Bhargava, C., Görick, C., Fuchs, H. and Weng, A. (2016). Saponins from Saponaria officinalis L. augment the efficacy of a rituximab-immunotoxin. Planta medica, 82(18), 1525-1531. [10] Reed, J., Orme, A., El-Demerdash, A., Owen, C., Martin, L. B., Misra, R. C., ... & Osbourn, A. (2023). Elucidation of the pathway for biosynthesis of saponin adjuvants from the soapbark tree. Science, 379(6638), 1252-1264.