Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
INTERLEAVED SCALAR MULTIPLICATION FOR ELLIPTIC CURVE CRYPTOGRAPHY
Document Type and Number:
WIPO Patent Application WO/2024/091708
Kind Code:
A1
Abstract:
Methods, apparatus, and computer readable storage medium for performing interleaved scalar multiplication are described. The method includes obtaining a bit-number of a scalar; factorizing the bit-number of the scalar into a product of a plurality of factors, the plurality of factors comprising s, d, and w; generating d tables based on a parameter, each table comprising N entries; for each iteration of s iterations: multiplying a result by two, constructing an index for each table from w bits in the scalar in the binary format, selecting a value from each table based on the constructed index for each table, and adding the value selected from each table to the result and starting next iteration; and in response to completing the s iterations, determining the result for a scalar multiplication between the scalar and the parameter.

Inventors:
FAN XUELEI (US)
Application Number:
PCT/US2023/063760
Publication Date:
May 02, 2024
Filing Date:
March 06, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TENCENT AMERICA LLC (US)
International Classes:
G06F7/72; H04L9/30; G06F7/57; G06F21/60
Attorney, Agent or Firm:
CHENG, Jun et al. (US)
Download PDF:
Claims:
CLAIMS What is claimed is: 1. A method for performing scalar multiplication between a scalar and a parameter, the method comprising: obtaining, by a device comprising a memory storing instructions and a processor in communication with the memory, a bit-number of a scalar, wherein the bit-number of the scalar is a number of bits in the scalar in a binary format; factorizing, by the device, the bit-number of the scalar into a product of a plurality of factors, the plurality of factors comprising s, d, and w, wherein s, d, and w are positive integers; generating, by the device, d tables based on a parameter, each table comprising N entries, wherein N is a positive integer and a function of w; for each iteration of s iterations: multiplying, by the device, a result by two, constructing, by the device, an index for each table from w bits in the scalar in the binary format, selecting, by the device, a value from each table based on the constructed index for each table, and adding, by the device, the value selected from each table to the result and starting next iteration; and in response to completing the s iterations, determining, by the device, the result for a scalar multiplication between the scalar and the parameter. 2. The method according to claim 1, wherein: N is equal to 2^w. 3. The method according to claim 1, wherein: an entry in the table with a table index of j is generated according to: ( b_0 + b_1 * 2^64 + ... + b_(w-1) * 2^((w-1)*d*s) ) * 2^(j*s) * P, wherein b_0, b_1, ... , b_(w-1) are one-bit binary numbers, {b_0, b_1, ... , b_(w-1)) is an entry index for the entry, j is an integer between 0 and 3, inclusive, and P is the parameter. 4. The method according to claim 1, further comprising: before starting a first iteration of the s iterations, setting the result as zero. 5. The method according to claim 1, wherein: the scalar comprises n bits {k_0, k_1, ... k_(n-1)} in a bit-level little-endian order, n being the bit-number of the scalar; i corresponds to an iteration index among the s iterations, i being an integer between 0 and (s-1), inclusive; and j corresponds to a table index among the d tables, j being an integer between 0 and (d-1), inclusive. 6. The method according to claim 5, wherein the constructing the index for each table from w bits in the scalar in the binary format comprises: constructing the index for each table from the w bits, k_(i+j*s+h*d*s), in the scalar in the bit-level little-endian order, h being an integer between 0 and (w-1), inclusive. 7. The method according to claim 5, wherein: the constructed index comprises w bits of {k_(i+j*s), k_(i+j*s+d*s), ... , k_(i+j*s+(w-1)*d*s)}. 8. The method according to claim 5, wherein the selecting the value from each table based on the constructed index for each table comprises: selecting the value from each table by a constant-time table look-up process based on the constructed index for each table.

9. The method according to claim 1, wherein: in response to the bit-number of the scalar being 256, s is 16, d is 4, and w is 4; and an entry in the table with a table index of j is generated according to: ( b_0 + b_1 * 2^64 + b_2 * 2^128 + b_3 * 2^192 ) * 2^(j*16) * P, wherein b_0, b_1, b_2, and b_3 are one-bit binary numbers, {b_0, b_1, b_2, b_3} is an entry index for the entry, j being an integer between 0 and 3, inclusive, and P is the parameter. 10. The method according to claim 9, wherein: an entry in a first table is generated according to ( b_0 + b_1 * 2^64 + b_2 * 2^128 + b_3 * 2^192 ) * P; an entry in a second table is generated according to ( b_0 + b_1 * 2^64 + b_2 * 2^128 + b_3 * 2^192 ) * 2^16 * P; an entry in a third table is generated according to ( b_0 + b_1 * 2^64 + b_2 * 2^128 + b_3 * 2^192 ) * 2^32 * P; and an entry in a fourth table is generated according to ( b_0 + b_1 * 2^64 + b_2 * 2^128 + b_3 * 2^192 ) * 2^48 * P. 11. The method according to claim 1, wherein: in response to the bit-number of the scalar being 256, s is 16, d is 4, and w is 4; the scalar is represented by n bits {k_0, k_1, ... k_255} in a bit-level little-endian order; i corresponds to an iteration index among the 16 iterations, i being an integer between 0 and 15, inclusive; j corresponds to a table index among the 4 tables, j being an integer between 0 and 3, inclusive; and the constructed index comprises 4 bits of {k_(i+j*16), k_(i+j*16+64), k_(i+j*16+128), k_(i+j*16+192)}. 12. The method according to claim 1, wherein: in response to the bit-number of the scalar being 384, s is 24, d is 4, and w is 4. 13. The method according to claim 1, wherein: in response to the bit-number of the scalar being 512, s is 32, d is 4, and w is 4. 14. An apparatus for performing scalar multiplication between a scalar and a parameter, the apparatus comprising: a memory storing instructions; and a processor in communication with the memory, wherein, when the processor executes the instructions, the processor is configured to cause the apparatus to perform the method in any of claims 1 to 13. 15. A non-transitory computer readable storage medium storing instructions, wherein, when the instructions are executed by a processor, the instructions are configured to cause the processor to perform the method in any of claims 1 to 13.

Description:
INTERLEAVED SCALAR MULTIPLICATION FOR ELLIPTIC CURVE CRYPTOGRAPHY INCORPORATION BY REFERENCE [0001] This application is based on and claims the benefit of priority to U.S. non- Provisional Patent Application No.17/973,696, filed on October 26, 2022, which is herein incorporated by reference in its entirety. FIELD OF THE TECHNOLOGY [0002] The present disclosure relates to scalar multiplication, and in particular, to an interleaved scalar multiplication for elliptic curve cryptography. BACKGROUND OF THE DISCLOSURE [0003] Elliptic-curve cryptography (ECC) is one of the public-key based cryptographies, and is based on the algebraic structure of elliptic curves over finite fields. ECC may allow much smaller keys compared to some other non-EC public-key based cryptographies while providing equivalent security. Scalar multiplications between scalars and parameters (e.g., EC points) are computer-intensive and dominate execution time of ECC or some other cryptography operations. However, there are various issues/problems associated with the scalar multiplication methods, particularly for example but not limited to, low efficiency and/or less resistant to side-channel attacks. [0004] The present disclosure describes various embodiments for performing interleaved scalar multiplication for elliptic curve cryptography, addressing at least one of the issues/problems discussed above. The present disclosure improves the technical field of public-key based cryptography, particularly ECC, with at least one of the following improvements: achieving high efficiency for computing scalar multiplications, reducing the execution times for computing scalar multiplications, enhancing resistant to many forms of side-channel attacks, increasing algorithm flexibility in response to hardware contains, and/or improving cybersecurity in term of data transmission and storage. SUMMARY [0005] The present disclosure describes various embodiments of methods, apparatus, and computer-readable storage medium for performing interleaved scalar multiplication for elliptic curve cryptography. [0006] According to one aspect, an embodiment of the present disclosure provides a method for performing scalar multiplication between a scalar and a parameter. The method includes obtaining, by a device, a bit-number of a scalar, wherein the bit-number of the scalar is a number of bits in the scalar in a binary format. The device includes a memory storing instructions and a processor in communication with the memory. The method also includes factorizing, by the device, the bit-number of the scalar into a product of a plurality of factors, the plurality of factors comprising s, d, and w, wherein s, d, and w are positive integers; generating, by the device, d tables based on a parameter, each table comprising N entries, wherein N is a positive integer and a function of w; for each iteration of s iterations: multiplying, by the device, a result by two, constructing, by the device, an index for each table from w bits in the scalar in the binary format, selecting, by the device, a value from each table based on the constructed index for each table, and adding, by the device, the value selected from each table to the result and starting next iteration; and in response to completing the s iterations, determining, by the device, the result for a scalar multiplication between the scalar and the parameter. [0007] According to another aspect, an embodiment of the present disclosure provides an apparatus for performing scalar multiplication between a scalar and a parameter. The apparatus includes a memory storing instructions; and a processor in communication with the memory. When the processor executes the instructions, the processor is configured to cause the apparatus to: obtain a bit-number of a scalar, wherein the bit-number of the scalar is a number of bits in the scalar in a binary format; factorize the bit-number of the scalar into a product of a plurality of factors, the plurality of factors comprising s, d, and w, wherein s, d, and w are positive integers; generate d tables based on a parameter, each table comprising N entries, wherein N is a positive integer and a function of w; for each iteration of s iterations: multiply a result by two, construct an index for each table from w bits in the scalar in the binary format, select a value from each table based on the constructed index for each table, and add the value selected from each table to the result and start next iteration; and in response to completing the s iterations, determine the result for a scalar multiplication between the scalar and the parameter. [0008] In another aspect, an embodiment of the present disclosure provides a non- transitory computer readable storage medium storing instructions. When the instructions are executed by a processor, the instructions cause the processor to: obtain a bit-number of a scalar, wherein the bit-number of the scalar is a number of bits in the scalar in a binary format; factorize the bit-number of the scalar into a product of a plurality of factors, the plurality of factors comprising s, d, and w, wherein s, d, and w are positive integers; generate d tables based on a parameter, each table comprising N entries, wherein N is a positive integer and a function of w; for each iteration of s iterations: multiply a result by two, construct an index for each table from w bits in the scalar in the binary format, select a value from each table based on the constructed index for each table, and add the value selected from each table to the result and start next iteration; and in response to completing the s iterations, determine the result for a scalar multiplication between the scalar and the parameter. [0009] The above and other aspects and their implementations are described in greater detail in the drawings, the descriptions, and the claims. BRIEF DESCRIPTION OF THE DRAWINGS [0010] FIG.1 is a flow diagram of an embodiment disclosed in the present disclosure. [0011] FIG.2 is a schematic diagram of an electronic device disclosed in the present disclosure. DETAILED DESCRIPTION OF THE DISCLOSURE [0012] The invention will now be described in detail hereinafter with reference to the accompanied drawings, which form a part of the present invention, and which show, by way of illustration, specific examples of embodiments. Please note that the invention may, however, be embodied in a variety of different forms and, therefore, the covered or claimed subject matter is intended to be construed as not being limited to any of the embodiments to be set forth below. Please also note that the invention may be embodied as methods, devices, components, or systems. Accordingly, embodiments of the invention may, for example, take the form of hardware, software, firmware or any combination thereof. [0013] Throughout the specification and claims, terms may have nuanced meanings suggested or implied in context beyond an explicitly stated meaning. The phrase “in one embodiment” or “in some embodiments” as used herein does not necessarily refer to the same embodiment and the phrase “in another embodiment” or “in other embodiments” as used herein does not necessarily refer to a different embodiment. Likewise, the phrase “in one implementation” or “in some implementations” as used herein does not necessarily refer to the same implementation and the phrase “in another implementation” or “in other implementations” as used herein does not necessarily refer to a different implementation. It is intended, for example, that claimed subject matter includes combinations of exemplary embodiments/implementations in whole or in part. [0014] In general, terminology may be understood at least in part from usage in context. For example, terms, such as “and”, “or”, or “and/or,” as used herein may include a variety of meanings that may depend at least in part upon the context in which such terms are used. Typically, “or” if used to associate a list, such as A, B or C, is intended to mean A, B, and C, here used in the inclusive sense, as well as A, B or C, here used in the exclusive sense. In addition, the term “one or more” or “at least one” as used herein, depending at least in part upon context, may be used to describe any feature, structure, or characteristic in a singular sense or may be used to describe combinations of features, structures or characteristics in a plural sense. Similarly, terms, such as “a”, “an”, or “the”, again, may be understood to convey a singular usage or to convey a plural usage, depending at least in part upon context. In addition, the term “based on” or “determined by” may be understood as not necessarily intended to convey an exclusive set of factors and may, instead, allow for existence of additional factors not necessarily expressly described, again, depending at least in part on context. [0015] Elliptic curve cryptography (ECC) is an approach to public-key cryptography based on the algebraic structure of elliptic curves over finite fields. Elliptic curves are applicable for key agreement, digital signatures, encryption, key transport and other schemes for general application within computer and communications systems. [0016] ECC may allow much smaller keys compared to some other non-EC public- key based cryptographies while providing equivalent security. Using a smaller key size, ECC may reduce storage and transmission requirements. For example, Rivest–Shamir–Adleman (RSA) based cryptography is a widely used public-key cryptosystem for secure data transmission. In comparison to a popular RSA-based cryptography system with a large modulus and correspondingly larger key, ECC could provide the same level of security afforded by the RSA-based system with a much smaller key: for example, a 256-bit elliptic curve public key may provide comparable security to a 3072-bit RSA public key. [0017] Scalar multiplications are compute-intensive and dominate the execution time of elliptic curve cryptographic operations. For example, when P is a point on an elliptic curve (E) of an order (N) and k is a positive integer, a multiplication between k and P may be defined as repeated addition of P by k times, i.e., k*P = P + P + … + P (there are k Ps adding together). This multiplication operation may be called as elliptic curve scalar multiplication or elliptic curve point multiplication. In some implementations, P may include 2D coordinates (Px, Py), and thus, k*P may refer to k*(Px, Py) = (k*Px, k*Py). For scalar multiplication in ECC, one or both of k and P may be very large integer(s), which, for example, may be 256-bit or bigger integer(s). [0018] Side-channel attacks on implementations of cryptosystems may include deriving secret information (e.g., private keys) by detecting (or measuring) at least one of the following: timing information, memory/cache storage/access information, or power consumption information. In elliptic curve cryptosystems, implementations of multiplication algorithms may be the primary targets for side-channel attacks. [0019] There are various issues/problems associated with the scalar multiplication methods, particularly for example but not limited to, low efficiency and/or less resistant to side-channel attacks. The present disclosure describes various embodiments for performing interleaved scalar multiplication for elliptic curve cryptography, addressing at least one of the issues/problems discussed above. The present disclosure improves the technical field of public-key based cryptography, particularly ECC, with at least one of the following improvements: achieving high efficiency for computing scalar multiplications, reducing the execution times for computing scalar multiplications, enhancing resistant to many forms of side-channel attacks, increasing algorithm flexibility in response to hardware contains, and/or improving cybersecurity in term of data transmission and storage. [0020] Various embodiments in the present disclosure describes methods for interleaved scalar multiplication, improving scalar multiplication algorithms in an elliptic curve cryptographic system. The operations based on various embodiments in the present disclosure may be more efficient than some other scalar multiplication algorithms and may be more resistant to side-channel attacks. As scalar multiplications are compute-intensive and dominate the execution time of elliptic curve cryptographic operations, the ECC system based on various embodiments in the present disclosure may increase performance and throughput of the ECC system, in comparison to other systems based on some other scalar multiplication algorithms. [0021] In some implementations, a scalar k, which has n binary bits, may be represented as a n-bit binary number/integer, (k 0 , k 1 , … k n-1 ), in a bit-level little-endian order, wherein k0, is the least significant bit, and kn-1 is the most significant bit. [0022] In some implementations, scalar multiplications may be performed based on a binary expansion method. For example, the scalar multiplication operation may be expressed as the following algorithm, wherein the returned Q is the result for the scalar multiplication between k and P. Q ← 0 for i from n-1 to 0 do then Q + P return Q [0023] For n-bit scalar k, the complexity of binary expansion method is about n double operation and n/2 addition (on average, a half of bits of the scalar k is “1” and the other half of bits is “0”). When n is 256, the complexity is about 256 double and 128 additions. [0024] In some implementations, scalar multiplication based on the binary expansion method may be vulnerable to side-channel attacks. For scalar multiplication based on the binary expansion method, given the fact that the addition of “Q + P” is only performed when the corresponding bit is “1” and the addition of “Q + P” is not performed when the corresponding bit is “0”, secret information about bits of the scalar k may be extracted by monitoring a power consumption and/or a computing time duration. [0025] In some implementations, scalar multiplications may be performed based on a Montgomery ladder method. For example, the scalar multiplication operation may be expressed as the following algorithm, wherein the returned R0 is the result for the scalar multiplication between k and P. R0 ← 0 R 1 ← P for i from n-1 to 0 do if ki = 0 then R 1 ← R 0 + R 1 R0 ← 2R0 else R0 ← R0 + R1 R1 ← 2R1 return R 0 [0026] For n-bit scalar k, the complexity of the Montgomery ladder method is about n double operation and n addition. If n is 256, the complexity is about 256 double and 256 additions. [0027] In some implementations, scalar multiplication based on the Montgomery ladder method may be inefficient. For scalar multiplication based on the Montgomery ladder method, an addition operation (“R 0 + R 1 ”) and a double operation (“2R 0 ” or “2R 0 ”) are performed for each bit in the n-bit scalar k. Since the number of addition operation and double operation does not depend on the value of each bit in the n-bit scalar k, the Montgomery ladder approach computes the point multiplication in a fixed amount of time, which may be beneficial when timing or power consumption measurements are exposed to an attacker performing a side-channel attack. [0028] In some implementations, scalar multiplications based on the Montgomery ladder method may be vulnerable to side-channel attacks based on information other than timing or power. For example, scalar multiplications based on the Montgomery ladder method may be vulnerable to memory based side-channel attacks (e.g., flush and/or reload of memory/cache). [0029] In some implementations, scalar multiplications may be performed based on a windowed method, wherein a single table including pre-computed values is used. In the windowed method, a window size w is selected and a table including all 2 w values of k i *P is pre-computed for i = 0, 1, 2, …, 2 w – 1. The algorithm may use the representation k = k 0 + 2 w k1 + 2 2w k2 + … + 2 mw km, where m = n / w for n-bit integer K. For example, the scalar multiplication operation may be expressed as the following algorithm, wherein the returned Q is the result for the scalar multiplication between K and P. Q ← 0 for i from m to 0 do for j from 0 to w do Q ← 2Q if k i > 0 then Q ← Q + kiP # kiP is pre-computed return Q [0030] In some implementations, the value of w in a windowed method may be chosen to be a small number so that the storage of the pre-computed values is in a reasonable size. For a non-limiting example, w=4 may be one of the best choices in practice. For n-bit scalar K, the complexity of windowed method is about n point double operations and at most n/w point addition operations. [0031] In some implementations, scalar multiplication based on the windowed method may be vulnerable to side-channel attacks, given the fact that the point addition (“Q + kiP”) is not performed when the ki is zero and the point addition (“Q + kiP”) is performed when the k i > zero. Secret information about bits of the scalar k may be extracted by monitoring a power consumption and/or a computing time duration. [0032] In some implementations, scalar multiplications may be performed based on a safe-select windowed method, which may have high resistant to the side-channel attacks. The safe-select windowed method may use a single table including pre-computed values. In the safe-select windowed method, a window size w is selected and a table including all 2 w values is pre-computed. The safe-select windowed method uses safe-select technologies. Safe-select is an implementation specific function to select the appropriate table entry without leaking information about the secret scalar. For example, by using safe-select, the scalar multiplication operation based on safe-select windowed method may be expressed as the following algorithm, wherein the returned Q is the result for the scalar multiplication between k and P. Q ← 0 for i from m to 0 do for j from 0 to w do Q ← 2Q T = safe-select(ki). // the pre-computed kiP Q ← Q + T return Q [0033] In some implementations, for n-bit scalar K, the complexity of safe-select windowed method is about n point double operation and n/w point addition. For a non- limiting example, when n is 256 and w is 4, the complexity is about 256 double and 64 additions. [0034] The present disclosure describes various embodiments for performing interleaved scalar multiplication program, which is an improved elliptic curve scalar multiplication algorithm. Various embodiments of interleaved scalar multiplication may be is more efficient than the basic binary expansion method and be more resistant to side-channel attacks. Various embodiments in the present disclosure may be easily implemented in software and/or hardware, increasing performance and throughput of systems that use elliptic curve cryptography. [0035] FIG. 1 shows a flow diagram of a method 100 for performing interleaved scalar multiplication. The method 100 may include a portion or all of the following steps: step 110: obtaining a bit-number of a scalar, wherein the bit-number of the scalar is a number of bits in the scalar in a binary format; step 120: factorizing the bit-number of the scalar into a product of a plurality of factors, the plurality of factors comprising s, d, and w, wherein s, d, and w are positive integers; step 130: generating d tables based on a parameter, each table comprising N entries, wherein N is a positive integer and a function of w; for each iteration of s iterations: step 142: multiplying a result by two, step 144: constructing an index for each table from w bits in the scalar in the binary format, step 146: selecting a value from each table based on the constructed index for each table, and step 148: adding the value selected from each table to the result and starting next iteration; and/or step 150: in response to completing the s iterations, determining the result for a scalar multiplication between the scalar and the parameter. [0036] In some implementations according to any one or any combination of the implementations or embodiments described in the present disclosure, N is equal to 2^w. [0037] In some implementations according to any one or any combination of the implementations or embodiments described in the present disclosure, an entry in the table with a table index of j is generated according to: ( b_0 + b_1 * 2^64 + ... + b_(w-1) * 2^((w- 1)*d*s) ) * 2^(j*s) * P, wherein b_0, b_1, ... , b_(w-1) are one-bit binary numbers, {b_0, b_1, ... , b_(w-1)) is an entry index for the entry, j is an integer between 0 and 3, inclusive, and P is the parameter. [0038] In some implementations according to any one or any combination of the implementations or embodiments described in the present disclosure, the method 100 may further include, before starting a first iteration of the s iterations, setting the result as zero. [0039] In some implementations according to any one or any combination of the implementations or embodiments described in the present disclosure, the scalar comprises n bits {k_0, k_1, ... k_(n-1)} in a bit-level little-endian order, n being the bit-number of the scalar; i corresponds to an iteration index among the s iterations, i being an integer between 0 and (s-1), inclusive; and j corresponds to a table index among the d tables, j being an integer between 0 and (d-1), inclusive. [0040] In some implementations according to any one or any combination of the implementations or embodiments described in the present disclosure, the constructing the index for each table from w bits in the scalar in the binary format comprises: constructing the index for each table from the w bits, k_(i+j*s+h*d*s), in the scalar in the bit-level little- endian order, h being an integer between 0 and (w-1), inclusive. [0041] In some implementations according to any one or any combination of the implementations or embodiments described in the present disclosure, the constructed index comprises w bits of {k_(i+j*s), k_(i+j*s+d*s), ... , k_(i+j*s+(w-1)*d*s)}. [0042] In some implementations according to any one or any combination of the implementations or embodiments described in the present disclosure, the selecting the value from each table based on the constructed index for each table comprises: selecting the value from each table by a constant-time table look-up process based on the constructed index for each table. [0043] In some implementations according to any one or any combination of the implementations or embodiments described in the present disclosure, in response to the bit- number of the scalar being 256, s is 16, d is 4, and w is 4; and an entry in the table with a table index of j is generated according to: ( b_0 + b_1 * 2^64 + b_2 * 2^128 + b_3 * 2^192 ) * 2^(j*16) * P, wherein b_0, b_1, b_2, and b_3 are one-bit binary numbers, {b_0, b_1, b_2, b_3} is an entry index for the entry, j being an integer between 0 and 3, inclusive, and P is the parameter. [0044] In some implementations according to any one or any combination of the implementations or embodiments described in the present disclosure, an entry in a first table is generated according to ( b_0 + b_1 * 2^64 + b_2 * 2^128 + b_3 * 2^192 ) * P; an entry in a second table is generated according to ( b_0 + b_1 * 2^64 + b_2 * 2^128 + b_3 * 2^192 ) * 2^16 * P; an entry in a third table is generated according to ( b_0 + b_1 * 2^64 + b_2 * 2^128 + b_3 * 2^192 ) * 2^32 * P; and an entry in a fourth table is generated according to ( b_0 + b_1 * 2^64 + b_2 * 2^128 + b_3 * 2^192 ) * 2^48 * P. [0045] In some implementations according to any one or any combination of the implementations or embodiments described in the present disclosure, in response to the bit- number of the scalar being 256, s is 16, d is 4, and w is 4; the scalar is represented by n bits {k_0, k_1, ... k_255} in a bit-level little-endian order; i corresponds to an iteration index among the 16 iterations, i being an integer between 0 and 15, inclusive; j corresponds to a table index among the 4 tables, j being an integer between 0 and 3, inclusive; and the constructed index comprises 4 bits of {k_(i+j*16), k_(i+j*16+64), k_(i+j*16+128), k_(i+j*16+192)}. [0046] In some implementations according to any one or any combination of the implementations or embodiments described in the present disclosure, in response to the bit- number of the scalar being 384, s is 24, d is 4, and w is 4. [0047] In some implementations according to any one or any combination of the implementations or embodiments described in the present disclosure, in response to the bit- number of the scalar being 512, s is 32, d is 4, and w is 4. [0048] In the present disclosure, a notation of “m^n” represents exponentiation operation, i.e, m n , corresponding to the base m and the exponent or power n. For example, 2^128 represents 2 128 , corresponding to 2 as the base and 128 as the exponent or the power. [0049] FIG. 2 shows an example of an electronic device 200 to implement one or more method described in the present disclosure. In one implementation, the electronic device 200 may be at least one of a computer, a server, a laptop, or a mobile device. In another implementation, the electronic device 200 may be a set of electronic devices comprising at least one of one or more computing server, one or more data server, one or more network server, one or more terminal, one or more laptop, and/or one or more mobile device. [0050] The electronic device 200 may include communication interfaces 202, a system circuitry 204, an input/output interfaces (I/O) 206, a display circuitry 208, and a storage 209. The display circuitry may include a user interface 210. The system circuitry 204 may include any combination of hardware, software, firmware, or other logic/circuitry. The system circuitry 204 may be implemented, for example, with one or more systems on a chip (SoC), application specific integrated circuits (ASIC), discrete analog and digital circuits, and other circuitry. The system circuitry 204 may be a part of the implementation of any desired functionality in the electronic device 200. In that regard, the system circuitry 204 may include logic that facilitates, as examples, decoding and playing music and video, e.g., MP3, MP4, MPEG, AVI, FLAC, AC3, or WAV decoding and playback; running applications; accepting user inputs; saving and retrieving application data; establishing, maintaining, and terminating cellular phone calls or data connections for, as one example, internet connectivity; establishing, maintaining, and terminating wireless network connections, Bluetooth connections, or other connections; and displaying relevant information on the user interface 210. The user interface 210 and the inputs/output (I/O) interfaces 206 may include a graphical user interface, touch sensitive display, haptic feedback or other haptic output, voice or facial recognition inputs, buttons, switches, speakers and other user interface elements. Additional examples of the I/O interfaces 206 may include microphones, video and still image cameras, temperature sensors, vibration sensors, rotation and orientation sensors, headset and microphone input / output jacks, Universal Serial Bus (USB) connectors, memory card slots, radiation sensors (e.g., IR sensors), and other types of inputs. [0051] Referring to FIG. 2, the communication interfaces 202 may include wireless transmitters and receivers ("transceivers") and any antennas used by the transmitting and receiving circuitry of the transceivers. The communication interfaces 202 may also include wireline transceivers, which may provide physical layer interfaces for any of a wide range of communication protocols, such as any type of Ethernet, data over cable service interface specification (DOCSIS), digital subscriber line (DSL), Synchronous Optical Network (SONET), or other protocol. The communication interfaces 202 may include a Radio Frequency (RF) transmit (Tx) and receive (Rx) circuitry 216 which handles transmission and reception of signals through one or more antennas 214. The communication interface 202 may include one or more transceivers. The transceivers may be wireless transceivers that include modulation / demodulation circuitry, digital to analog converters (DACs), shaping tables, analog to digital converters (ADCs), filters, waveform shapers, filters, pre-amplifiers, power amplifiers and/or other logic for transmitting and receiving through one or more antennas, or (for some devices) through a physical (e.g., wireline) medium. The transmitted and received signals may adhere to any of a diverse array of formats, protocols, modulations (e.g., QPSK, 16-QAM, 64-QAM, or 256-QAM), frequency channels, bit rates, and encodings. As one specific example, the communication interfaces 202 may include transceivers that support transmission and reception under the 2G, 3G, BT, WiFi, Universal Mobile Telecommunications System (UMTS), High Speed Packet Access (HSPA)+, 4G / Long Term Evolution (LTE) , and 5G standards. The techniques described below, however, are applicable to other wireless communications technologies whether arising from the 3rd Generation Partnership Project (3GPP), GSM Association, 3GPP2, IEEE, or other partnerships or standards bodies. [0052] The system circuitry 204 may include hardware, software, firmware, or other circuitry in any combination. The system circuitry 204 may be implemented, for example, with one or more systems on a chip (SoC), application specific integrated circuits (ASIC), microprocessors, discrete analog and digital circuits, and other circuitry. For example referring to FIG. 2, the system circuitry 204 may include one or more processors 221 and memories 222. The memory 222 stores, for example, an operating system 224, instructions 226, and parameters 228. The processor 221 is configured to execute the instructions 226 to carry out desired functionality for the electronic device 200. The parameters 228 may provide and specify configuration and operating options for the instructions 226. The memory 222 may also store any BT, WiFi, 3G, 4G, 5G or other data that the electronic device 200 will send, or has received, through the communication interfaces 202. In various implementations, a system power for the electronic device 200 may be supplied by a power storage device, such as a battery or a transformer. [0053] The storage 209 may be used to store various initial, intermediate, or final data. In one implementation, the storage 209 may be integral with a database server. The storage 209 may be centralized or distributed, and may be local or remote to the electronic device 200. For example, the storage 209 may be hosted remotely by a cloud computing service provider. [0054] The present disclosure describes various embodiments, which may be implemented, partly or totally, on the one or more electronic device described in FIG. 2. [0055] In some embodiments, an interleaved scalar multiplication method may be processed and implemented as the following. [0056] A first step includes selecting proper decomposition factors for a given scalar size in bits. This may be achieved manually or automatically in a computer program. [0057] A second step includes, according to the selected parameters, generating pre- computed resources for the interleaved scalar multiplication algorithm. This may be achieved automatically in a computer program. [0058] A third step includes implementing the interleaved scalar multiplication algorithm, with the specific parameters and pre-computed resources. The detailed implementation of the interleaved scalar multiplication algorithm is described in the present disclosure. [0059] A fourth step includes performing the computation for the specific scalar multiplication. [0060] The scalar k may be represented as a n-bit integer, (k 0 , k 1 , … k n-1 ), in a bit- level little-endian order. The scalar multiplication operation may be expressed as follow for the binary representation: k*P = P + P + … + P = k0*P + k1*2*P + … + kn-1*2 n-1 *P. [0061] In the present disclosure, merely to simply notation for clear expression, the multiplication symbol (“*”) may be omitted in some formulas, for example, kP may refer to k*P, k2s-12 s P may refer to k2s-1*2 s *P, etc. [0062] The bit length of the scalar k may be factorized/decomposed into a plurality of factors, which may be three factors, four factors, or five factors. For a non-limiting example, The bit length of the scalar k may be factorized/decomposed into three factors, and each factor is a positive small integer: n = s*d*w, wherein s is a sliding size, w is a window size, and d is a dimension size. The scalar multiplication operation may be expressed as follows for the binary representation: [0063] The above formula may be expanded as the following. 2 1 (k 1 P + k s+1 2 s P + … + k n-s+1 2 n-s P) + … 2 s-1 (ks-1P + k2s-12 s P + … + kn-12 n-s P) = 2 0 N 0 + 2 1 N 1 + … + 2 s-1 N s-1 [0064] The Ni in the above formula may be expanded as the following, wherein i is smaller than s, and is larger than or equal to zero. Ni = kiP + ki+s2 s P + … + ki+n-s2 n-s P = k i P + k i+ds 2 ds P + … + k i+(w-1)ds 2 (w-1)ds P + k i+s 2 s P + k i+(d+1)s 2 (d+1)s P + … + k i+(w-1)ds+s 2 (w-1)ds+s P + … k i+(d-1)s 2 (d-1)s P + k i+(2d-1)s 2 (2d-1)s P + … + k i+n-s 2 n-s P = Wi0 +Wi1 + …+ Wi(d-1) [0065] The Wij in the above formula may be expanded as the following, wherein j is smaller than d, and is larger than or equal to zero. Wij = ki+js2 js P + ki+js+ds2 js+ds P + … + ki+js+(w-1)ds2 js+(w-1)ds P [0066] A plurality of tables including the W ij values may be pre-computed by combination different values of associated ki bits, the details of which are described in other part of the present disclosure. [0067] With the parameters and pre-computed resources, the interleaved scalar multiplication may be expressed as the following algorithm, wherein the returned Q is the result for the scalar multiplication between k and P. Q ← 0 for i from s-1 to 0 do Q ← 2Q for j from v = T = safe-select(j, v). # the value of v in the # pre-computed table j. Q ← Q + T return Q [0068] For n-bit scalar k, the complexity of the interleaved method is about s point double operation and n/w point addition. In comparison to a safe-select windowed method, the point double computation complexity is reduced from n to s times. For a non-limiting example, when n is 256 and (s, d, w) are (16, 4, 4), respectively, the complexity of the interleaved method is about 16 double and 64 additions operations. In comparison to a safe- select windowed method, the point double computation complexity is reduced 16 times (i.e., from 256 double operations to 16 double operations). Table 1 shows an example of the reduction of the complexity for different scalar size. Table 1. Reduction of computation complexity [0069] In some implementations, proper decomposition factors may be determined for a given scalar size in bits, and this determinization may be achieved manually or automatically in a computer program. For the given scalar length n, decomposing the length into small factors may depend on one or more other constrains. For one example, when a system has a large computing capability and limited memory, a larger sliding size (s) may be considered, so as to accommodate a smaller window size (w) and/or a smaller dimension size (d). For another example, when a system has no constrain on the memory (e.g., a sufficiently large memory size), a larger window size and/or a larger dimension size (d) may be determined, so as to result in a smaller sliding size to reduce computing time. [0070] For non-limiting examples, Table 2 shows several selections for (s, w, d) for a scalar with different sizes. Usually, 4 may be a preferred choice for window size w and dimension size d. Table 2. Examples of (s, w, d) [0071] In some implementations, when the scalar length is decomposed into small factors, pre-computed tables including Wij values as entries may be generated according to the decomposed factors. This may be achieved automatically in a computer program. For the W ij values, the tables may be pre-computed by combination different values of associated k i bits for {2 js P, 2 js+ds P, …, 2 js+(w-1)ds P } according to the following. Wij = ki+js2 js P + ki+js+ds2 js+ds P + … + ki+js+(w-1)ds2 js+(w-1)ds P [0072] There may be d pre-computed tables. Each table contains 2 w elements/entries, and each entry in the table may be indexed by an index (or key) {k i+ns , k i+js+ds , …, k i+js+(w-1)ds }. The index has w-bit, and when each bit in the index takes all possible values of “0” or “1”, there are 2 w indexes corresponding to the 2 w elements/entries in the table, respectively. [0073] For a non-limiting example, when n is 256 and (s, d, w) is (16, 4, 4), there are 4 pre-computed tables, each table contains 16 elements/entries, and each entry is indexed by a 4-bit index value. Table 3 shows a first pre-computed table, which may also be labeled as table number (No.) 0 with j = 0. Table 3. First pre-computed table (j = 0) [0074] Table 4 shows a second pre-computed table, which may also be labeled as table number (No.) 1 with j = 1. Table 4. Second pre-computed table (j = 1) [0075] Table 5 shows a third pre-computed table, which may also be labeled as table number (No.) 2 with j = 2. Table 5. Third pre-computed table (j = 2) [0076] Table 6 shows a fourth pre-computed table, which may also be labeled as table number (No.) 3 with j = 3. Table 6. Fourth pre-computed table (j = 3) [0077] In some implementations, for a given scalar k, the constructed indexes for looking up the tables may include secret information (e.g., information about the “0” and/or “1” of the bits in the given scalar). To increase side-channel attack resistant, table lookup may be performed in constant-time, using a safe-select algorithm. With constant-time table lookup algorithm, no secret information may be leaked by side-channel attack (e.g., by monitoring timing or power consumption). [0078] In some implementations, after the decomposed factors are determined and pre-computed tables are generated for a given P, the interleaved scalar multiplication may be performed for a given a scalar. [0079] In some implementations, when the scalar size is big and better performance is desired, the scalar size may be decomposed into more than three factors, and each factor is a small positive integer, for example, 4 factors, 5 factors, etc. [0080] The present disclosure is not limited to expressing the scalar in a little-endian order, which is used to provide exemplary embodiments. In some other embodiments, the scalar may be expressed in a big-endian order, and the method for performing interleaved scalar multiplication may be performed following the similar patterns as described above in the present disclosure, for example, the tables and/or formulas have similar but different patterns. [0081] The present disclosure provides a method, an apparatus, and a non-transitory computer readable storage medium for performing interleaved scalar multiplication. The present disclosure describes various embodiments to be more efficient than some other methods and be more resistant to side-channel attacks. [0082] The present disclosure provides at least one of the following contributions: by properly decompose the scalar size, the interleaved scalar multiplication is more efficient than some other methods, for example, the basic binary expansion method and/or the safe- select windowed method; by using proper safe-select table algorithm, the interleaved scalar multiplication is side-channel attack resistant; by introducing the dimension size, the look-up table could be relative small, so that the safe-select table look-up may be more efficient; and/or the scalar size factors may be customized, so that an implementation with a good choice of the scalar size factors may accommodate and/or balance constrains of CPU and memory for the specific circumstance. [0083] In the embodiments and implementation of this disclosure, any steps and/or operations may be combined or arranged in any amount or order, as desired. Two or more of the steps and/or operations may be performed in parallel. Embodiments and implementations in the disclosure may be used separately or combined in any order. [0084] The techniques described above, may be implemented as computer software using computer-readable instructions and physically stored in one or more computer-readable media. For example, human accessible storage devices and their associated media include such as optical media including CD/DVD ROM/RW with CD/DVD or the like media, thumb- drive, removable hard drive or solid state drive, legacy magnetic media such as tape and floppy disc, specialized ROM/ASIC/PLD based devices such as security dongles, and the like. Those skilled in the art may also understand that term “computer readable media” as used in connection with the presently disclosed subject matter does not encompass transmission media, carrier waves, or other transitory signals. [0085] While the particular invention has been described with reference to illustrative embodiments, this description is not meant to be limiting. Various modifications of the illustrative embodiments and additional embodiments of the invention will be apparent to one of ordinary skill in the art from this description. Those skilled in the art will readily recognize that these and various other modifications can be made to the exemplary embodiments, illustrated and described herein, without departing from the spirit and scope of the present invention. It is therefore contemplated that the appended claims will cover any such modifications and alternate embodiments. Certain proportions within the illustrations may be exaggerated, while other proportions may be minimized. Accordingly, the disclosure and the figures are to be regarded as illustrative rather than restrictive.