Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR INDEPENDENT PROCESSOR CACHES IN A SCALABLE MULTIPROCESSOR SYSTEM ABSENT A HARDWARE CACHE COHERENCE PROTOCOL
Document Type and Number:
WIPO Patent Application WO/2024/076380
Kind Code:
A1
Abstract:
Methods, systems, and computer program products are provided for maintaining coherent and/or consistent cache memory with independent processor caches in a scalable multiprocessor system absent a hardware cache coherence protocol The method includes executing an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory and on data resources in a cache memory of at least one processor; locking a shared memory address of the shared memory; retrieving a shared data resource from the shared memory address based on a shared data load instruction; storing the shared data resource in the cache memory of at least one processor to generate a cache data resource; performing an operation on the cache data resource to generate an updated cache data resource; updating the shared data resource of the shared memory based on the updated cache data resource; and unlocking the shared memory address of the shared memory.

Inventors:
YANG FRANKLIN (US)
Application Number:
PCT/US2023/011582
Publication Date:
April 11, 2024
Filing Date:
January 26, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
YANG FRANKLIN (US)
International Classes:
G06F12/0806
Foreign References:
US20060155936A12006-07-13
US9558119B22017-01-31
Attorney, Agent or Firm:
HARROD, Samuel C. (US)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A computer-implemented method, comprising: executing, with at least one processor, an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory comprising memory that is accessible by a plurality of processors and on data resources in a cache memory of the at least one processor, the shared memory comprising a plurality of shared memory locations, each shared memory location comprising a shared memory address; locking, with the at least one processor, a shared memory location of the shared memory based on a shared memory address; retrieving, with the at least one processor, a shared data resource from the shared memory location of the shared memory based on a shared data load instruction, the shared data load instruction comprising the shared memory address; storing, with the at least one processor, the shared data resource in the cache memory of the at least one processor to generate a cache data resource; performing, with the at least one processor, an operation on the cache data resource in the cache memory of the at least one processor to generate an updated cache data resource in the cache memory of the at least one processor; updating, with the at least one processor, the shared data resource at the shared memory location of the shared memory based on the updated cache data resource in the cache memory of the at least one processor; and unlocking, with the at least one processor, the shared memory location of the shared memory based on the shared memory address.

2. The computer-implemented method of claim 1 , wherein locking the shared memory location of the shared memory comprises locking the shared memory location using a semaphore.

3. The computer-implemented method of claim 1 , wherein a computing device comprises a plurality of processors, the plurality of processors comprising the at least one processor, each processor of the plurality of processors

67

SUBSTITUTE SHEET ( RULE 26) comprising a cache memory and configured to execute the independent-cache collaboration algorithm within the computing device.

4. The computer-implemented method of claim 3, wherein each processor of the plurality of processors does not interact with other processors of the plurality of processors of the computing device to perform a hardware cache coherence protocol.

5. The computer-implemented method of claim 3, wherein each processor of the plurality of processors is not capable of accessing the cache memory of each other processor of the plurality of processors, and wherein each cache memory of each processor of the plurality of processors functions independent of values in each other cache memory of each other processor of the plurality of processors.

6. The computer-implemented method of claim 3, wherein the independent-cache collaboration algorithm is configured to cause each processor of the plurality of processors to retrieve a shared data resource from the shared memory.

7. The computer-implemented method of claim 1 , wherein updating the shared data resource at the shared memory address of the shared memory based on the updated cache data resource in the cache memory of the at least one processor is performed using a hardware synchronization primitive.

8. The computer-implemented method of claim 1 , wherein the shared data load instruction comprises a modified load instruction.

9. The computer-implemented method of claim 1 , wherein retrieving the shared data resource from the shared memory is based on a load instruction of the at least one processor.

10. The computer-implemented method of claim 1 , wherein updating the shared data resource at the shared memory address of the shared memory comprises:

68

SUBSTITUTE SHEET ( RULE 26) storing the updated cache data resource at the shared memory address of the shared memory based on a modified hardware synchronization primitive.

11. The computer-implemented method of claim 10, wherein the modified hardware synchronization primitive comprises a same format as a hardware synchronization primitive.

12. The computer-implemented method of claim 1 , wherein the shared data load instruction comprises a same format as a load instruction.

13. A computer-implemented method, comprising:

(a) executing, with at least one processor, an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory comprising memory that is accessible by a plurality of processors and on data resources in a cache memory of the at least one processor, the shared memory comprising a plurality of shared memory locations, each shared memory location comprising a shared memory address;

(b) retrieving, with the at least one processor, a shared data resource of shared data from a shared memory address of the shared memory based on a shared data load instruction, the shared data load instruction comprising the shared memory address;

(c) storing, with the at least one processor, the shared data resource in the cache memory of the at least one processor to generate a cache data resource;

(d) performing, with the at least one processor, an operation on the cache data resource in the cache memory of the at least one processor to generate an updated cache data resource in the cache memory of the at least one processor;

(e) detecting, with the at least one processor, that the shared data resource at the shared memory address of the shared memory is modified;

(f) in response to detecting that the shared data resource at the shared memory address of the shared memory is modified, repeating steps (b)-(d); and

(g) updating, with the at least one processor, the shared data resource at the shared memory address of the shared memory based on the updated cache data resource in the cache memory of the at least one processor.

69

SUBSTITUTE SHEET ( RULE 26)

14. The computer-implemented method of claim 13, wherein detecting that the shared data resource at the shared memory address of the shared memory is modified comprises a spin loop.

15. The computer-implemented method of claim 13, wherein steps (e), (f), and (g) are atomic operations.

16. A system comprising at least one processor programmed or configured to: execute an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory comprising memory that is accessible by a plurality of processors and on data resources in a cache memory of the at least one processor, the shared memory comprising a plurality of shared memory locations, each shared memory location comprising a shared memory address; lock a shared memory location of the shared memory based on a shared memory address; retrieve a shared data resource from the shared memory location of the shared memory based on a shared data load instruction, the shared data load instruction comprising the shared memory address; store the shared data resource in the cache memory of the at least one processor to generate a cache data resource; perform an operation on the cache data resource in the cache memory of the at least one processor to generate an updated cache data resource in the cache memory of the at least one processor; update the shared data resource at the shared memory location of the shared memory based on the updated cache data resource in the cache memory of the at least one processor; and unlock the shared memory location of the shared memory based on the shared memory address.

70

SUBSTITUTE SHEET ( RULE 26)

17. The system of claim 16, wherein locking the shared memory location of the shared memory comprises locking the shared memory location using a semaphore.

18. The system of claim 16, wherein a computing device comprises a plurality of processors, the plurality of processors comprising the at least one processor, each processor of the plurality of processors comprising a cache memory and configured to execute the independent-cache collaboration algorithm within the computing device.

19. The system of claim 18, wherein each processor of the plurality of processors does not interact with other processors of the plurality of processors of the computing device to perform a hardware cache coherence protocol.

20. The system of claim 18, wherein each processor of the plurality of processors is not capable of accessing the cache memory of each other processor of the plurality of processors, and wherein each cache memory of each processor of the plurality of processors functions independent of values in each other cache memory of each other processor of the plurality of processors.

21. The system of claim 18, wherein the independent-cache collaboration algorithm is configured to cause each processor of the plurality of processors to retrieve a shared data resource from the shared memory.

22. The system of claim 16, wherein updating the shared data resource at the shared memory address of the shared memory based on the updated cache data resource in the cache memory of the at least one processor is performed using a hardware synchronization primitive.

23. The system of claim 16, wherein the shared data load instruction comprises a modified load instruction.

71

SUBSTITUTE SHEET ( RULE 26)

24. The system of claim 16, wherein retrieving the shared data resource from the shared memory is based on a load instruction of the at least one processor.

25. The system of claim 16, wherein, when updating the shared data resource at the shared memory address of the shared memory, the at least one processor is programmed or configured to: store the updated cache data resource at the shared memory address of the shared memory based on a modified hardware synchronization primitive.

26. The system of claim 25, wherein the modified hardware synchronization primitive comprises a same format as a hardware synchronization primitive.

27. The system of claim 16, wherein the shared data load instruction comprises a same format as a load instruction.

28. A system comprising at least one processor programmed or configured to:

(a) execute an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory comprising memory that is accessible by a plurality of processors and on data resources in a cache memory of the at least one processor, the shared memory comprising a plurality of shared memory locations, each shared memory location comprising a shared memory address;

(b) retrieve a shared data resource of shared data from a shared memory address of the shared memory based on a shared data load instruction, the shared data load instruction comprising the shared memory address;

(c) store the shared data resource in the cache memory of the at least one processor to generate a cache data resource;

(d) perform an operation on the cache data resource in the cache memory of the at least one processor to generate an updated cache data resource in the cache memory of the at least one processor;

72

SUBSTITUTE SHEET ( RULE 26) (e) detect that the shared data resource at the shared memory address of the shared memory is modified;

(f) in response to detecting that the shared data resource at the shared memory address of the shared memory is modified, repeat steps (b)-(d); and

(g) update the shared data resource at the shared memory address of the shared memory based on the updated cache data resource in the cache memory of the at least one processor.

29. The system of claim 28, wherein detecting that the shared data resource at the shared memory address of the shared memory is modified comprises a spin loop.

30. The system of claim 28, wherein steps (e), (f), and (g) are atomic operations.

31. A computer program product comprising at least one non- transitory computer-readable medium including program instructions that, when executed by at least one processor, cause the at least one processor to: execute an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory comprising memory that is accessible by a plurality of processors and on data resources in a cache memory of the at least one processor, the shared memory comprising a plurality of shared memory locations, each shared memory location comprising a shared memory address; lock a shared memory location of the shared memory based on a shared memory address; retrieve a shared data resource from the shared memory location of the shared memory based on a shared data load instruction, the shared data load instruction comprising the shared memory address; store the shared data resource in the cache memory of the at least one processor to generate a cache data resource;

73

SUBSTITUTE SHEET ( RULE 26) perform an operation on the cache data resource in the cache memory of the at least one processor to generate an updated cache data resource in the cache memory of the at least one processor; update the shared data resource at the shared memory location of the shared memory based on the updated cache data resource in the cache memory of the at least one processor; and unlock the shared memory location of the shared memory based on the shared memory address.

32. The computer program product of claim 31 , wherein locking the shared memory location of the shared memory comprises locking the shared memory location using a semaphore.

33. The computer program product of claim 31 , wherein a computing device comprises a plurality of processors, the plurality of processors comprising the at least one processor, each processor of the plurality of processors comprising a cache memory and configured to execute the independent-cache collaboration algorithm within the computing device.

34. The computer program product of claim 33, wherein each processor of the plurality of processors does not interact with other processors of the plurality of processors of the computing device to perform a hardware cache coherence protocol.

35. The computer program product of claim 33, wherein each processor of the plurality of processors is not capable of accessing the cache memory of each other processor of the plurality of processors, and wherein each cache memory of each processor of the plurality of processors functions independent of values in each other cache memory of each other processor of the plurality of processors.

74

SUBSTITUTE SHEET ( RULE 26)

36. The computer program product of claim 33, wherein the independent-cache collaboration algorithm is configured to cause each processor of the plurality of processors to retrieve a shared data resource from the shared memory.

37. The computer program product of claim 31 , wherein updating the shared data resource at the shared memory address of the shared memory based on the updated cache data resource in the cache memory of the at least one processor is performed using a hardware synchronization primitive.

38. The computer program product of claim 31 , wherein the shared data load instruction comprises a modified load instruction.

39. The computer program product of claim 31 , wherein retrieving the shared data resource from the shared memory is based on a load instruction of the at least one processor.

40. The computer program product of claim 31 , wherein the one or more instructions that cause the at least one processor to update the shared data resource at the shared memory address of the shared memory cause the at least one processor to: store the updated cache data resource at the shared memory address of the shared memory based on a modified hardware synchronization primitive.

41 . The computer program product of claim 40, wherein the modified hardware synchronization primitive comprises a same format as a hardware synchronization primitive.

42. The computer program product of claim 31 , wherein the shared data load instruction comprises a same format as a load instruction.

43. A computer program product comprising at least one non- transitory computer-readable medium including program instructions that, when executed by at least one processor, cause the at least one processor to:

75

SUBSTITUTE SHEET ( RULE 26) (a) execute an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory comprising memory that is accessible by a plurality of processors and on data resources in a cache memory of the at least one processor, the shared memory comprising a plurality of shared memory locations, each shared memory location comprising a shared memory address;

(b) retrieve a shared data resource of shared data from a shared memory address of the shared memory based on a shared data load instruction, the shared data load instruction comprising the shared memory address;

(c) store the shared data resource in the cache memory of the at least one processor to generate a cache data resource;

(d) perform an operation on the cache data resource in the cache memory of the at least one processor to generate an updated cache data resource in the cache memory of the at least one processor;

(e) detect that the shared data resource at the shared memory address of the shared memory is modified;

(f) in response to detecting that the shared data resource at the shared memory address of the shared memory is modified, repeat steps (b)-(d); and

(g) update the shared data resource at the shared memory address of the shared memory based on the updated cache data resource in the cache memory of the at least one processor.

44. The computer program product of claim 43, wherein detecting that the shared data resource at the shared memory address of the shared memory is updated comprises a spin loop.

45. The computer program product of claim 43, wherein steps (e), (f), and (g) are atomic operations.

76

SUBSTITUTE SHEET ( RULE 26)

Description:
METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR INDEPENDENT PROCESSOR CACHES IN A SCALABLE MULTIPROCESSOR SYSTEM ABSENT A HARDWARE CACHE COHERENCE PROTOCOL

CROSS-REFERENCE

[0001] This International Application is related to and claims priority to U.S. Provisional Application No. 63/413,284, filed on October 5, 2022, the entire contents of which are incorporated herein by reference.

BACKGROUND

1. Field

[0002] This disclosure relates generally to multiprocessing systems, managing cache memory in a parallel processing environment, and scalable multiprocessor systems with processor cache memory including, in some non-limiting embodiments or aspects, systems, methods, and computer program products for maintaining coherent and/or consistent cache memory with independent processor caches in a scalable multiprocessor system absent a hardware cache coherence protocol.

2. Technical Considerations

[0003] Some computers (e.g., computing devices) may be configured to perform multiple calculations and/or processes simultaneously using a plurality of processors (e.g., parallel computer systems). In some instances, parallel computer systems may allow for larger (e.g., more complex) problems to be solved in a shorter amount of time due to calculations and/or processes being performed simultaneously, rather than the calculations and/or processes being performed sequentially by a single processor of a computing device. With a parallel computer system, each processor of the plurality of processors may have access to shared memory within a computing device and each processor may have a cache memory such that data and/or values (e.g., data resources) stored in shared memory may be read by each processor and written into the cache memory of each processor to perform the calculations and/or processes on the data resources. Once each processor has completed performing the calculations and/or processes on the data, the data may be written back to shared memory.

[0004] When performing calculations and/or processes simultaneously with a parallel computer system, there may be a risk that some data resources may be

1

SUBSTITUTE SHEET ( RULE 26) processed out of order and/or improperly overwritten by another read/write operation of one of the processors of the plurality of processors. For example, two or more processors may read the same data resource residing at a shared memory address in shared memory such that each of the two or more processors generates a local copy of the data resource in the cache memory (e.g., each processor writes a copy of the data resource to their own cache memory). The two or more processors may each perform a different operation on their own local copy of the data resource in their own cache memory simultaneously to generate different results (e.g., new and/or updated data resources), and each of the two or more processors may attempt to write the different results back to the shared memory address (e.g., each processor may attempt to write to the same shared memory address) in shared memory. Generating different results from simultaneous calculation and/or processing by the two or more processors may create inconsistencies in the data resources residing in shared memory and/or in each cache memory of each processor. In some instances, the inconsistencies in the data resources may cause problems in calculations and/or incorrect results of a calculation and/or process. Thus, the parallel computer system may need to make sure that all of the data residing in shared memory and in each cache memory of each processor remains current, coherent, and/or consistent. Some parallel computer systems may implement a hardware cache coherence protocol to remedy some of the inconsistencies that may occur between the data residing in shared memory and/or in each cache memory of each processor.

[0005] However, a hardware cache coherence protocol may be dependent on an initial configuration of a plurality of processors in a parallel computer system. For example, a parallel computer system that implements a hardware cache coherence protocol may only function with a specific configuration of processors (e.g., multiprocessors) and may not be scalable (e.g., expanding a configuration of a system such that the system may be able to handle a larger load of requests, calculations processing, and/or the like) with the addition of more processors to the parallel computer system. The parallel computer system may not be scalable with the addition of more processors because adding each additional processor having a connection to shared memory would introduce additional overhead, requiring a hardware cache coherence protocol to monitor another cache and requiring additional hardware checks. Further, with each addition of another processor to the parallel computer

2

SUBSTITUTE SHEET ( RULE 26) system, the overhead needed to keep all caches coherent and/or consistent increases significantly. For example, in a parallel computer system including a first processor having a first cache and a second processor having a second cache, the first cache must be kept coherent and/or consistent with the second cache. Adding a third processor with a third cache to the parallel computer system would then require the first cache to be coherent and/or consistent with the second cache and the third cache, the second processor to be coherent and/or consistent with the first cache and the third cache, and the third cache to be coherent and/or consistent with the first cache and the second cache. With more processors that are added to the parallel computer system, maintaining coherent and/or consistent caches via a hardware cache coherence protocol becomes increasingly more resource intensive.

[0006] Sometimes, scaling (e.g., expanding a configuration of a system) a parallel computer system may be accomplished by adding memory and/or increasing an amount of memory resources. However, adding memory and/or increasing the amount of memory resources only allows for additional storage and does not increase processing capabilities for each processor in a parallel computer system. In some instances, the performance of a parallel computer system may be negatively impacted by the addition of processors to a parallel computer system with a hardware cache coherence protocol. The addition of more processors to a parallel computer system may require additional configuration, such as modifying the hardware cache coherence protocol and providing the additional processors with access to cache memory of existing processors in the parallel computer system.

SUMMARY

[0007] Accordingly, provided are methods, systems, and computer program products for maintaining coherent and/or consistent cache memory with independent processor caches in a scalable multiprocessor system absent a hardware cache coherence protocol that overcome some or all of the deficiencies identified above.

[0008] According to non-limiting embodiments or aspects, provided is a computer- implemented method. In some non-limiting embodiments or aspects, the method may 3

SUBSTITUTE SHEET ( RULE 26) include executing an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory including memory that is accessible by a plurality of processors and on data resources in a cache memory of the at least one processor. The shared memory may include a plurality of shared memory locations. Each shared memory location may include a shared memory address. The method may further include locking a shared memory location of the shared memory based on a shared memory address. The method may further include retrieving a shared data resource from the shared memory location of the shared memory based on a shared data load instruction. The shared data load instruction may include the shared memory address. The method may further include storing the shared data resource in the cache memory of the at least one processor to generate a cache data resource. The method may further include performing an operation on the cache data resource in the cache memory of the at least one processor to generate an updated cache data resource in the cache memory of the at least one processor. The method may further include updating the shared data resource at the shared memory location of the shared memory based on the updated cache data resource in the cache memory of the at least one processor. The method may further include unlocking the shared memory location of the shared memory based on the shared memory address.

[0009] In some non-limiting embodiments or aspects, locking the shared memory location of the shared memory may include locking the shared memory location using a semaphore.

[0010] In some non-limiting embodiments or aspects, a computing device may include a plurality of processors. The plurality of processors may include the at least one processor. Each processor of the plurality of processors may include a cache memory. Each processor of the plurality of processors may be configured to execute the independent-cache collaboration algorithm within the computing device.

[0011] In some non-limiting embodiments or aspects, each processor of the plurality of processors may not interact with other processors of the plurality of processors of the computing device to perform a hardware cache coherence protocol. [0012] In some non-limiting embodiments or aspects, each processor of the plurality of processors may not be capable of accessing the cache memory of each other processor of the plurality of processors. Each cache memory of each processor

4

SUBSTITUTE SHEET ( RULE 26) of the plurality of processors may function independent of values in each other cache memory of each other processor of the plurality of processors.

[0013] In some non-limiting embodiments or aspects, the independent-cache collaboration algorithm may be configured to cause each processor of the plurality of processors to retrieve a shared data resource from the shared memory.

[0014] In some non-limiting embodiments or aspects, updating the shared data resource at the shared memory address of the shared memory based on the updated cache data resource in the cache memory of the at least one processor may be performed using a hardware synchronization primitive.

[0015] In some non-limiting embodiments or aspects, the shared data load instruction may include a modified load instruction.

[0016] In some non-limiting embodiments or aspects, retrieving the shared data resource from the shared memory may be based on a load instruction of the at least one processor.

[0017] In some non-limiting embodiments or aspects, updating the shared data resource at the shared memory address of the shared memory may include storing the updated cache data resource at the shared memory address of the shared memory based on a modified hardware synchronization primitive.

[0018] In some non-limiting embodiments or aspects, the modified hardware synchronization primitive may include a same format as a hardware synchronization primitive.

[0019] In some non-limiting embodiments or aspects, the shared data load instruction may include a same format as a load instruction.

[0020] According to non-limiting embodiments or aspects, provided is a system including at least one processor and at least one non-transitory computer-readable medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the computer-implemented method.

[0021] According to non-limiting embodiments or aspects, provided is a computer program product including at least one non-transitory computer-readable medium including program instructions that, when executed by at least one processor, cause the at least one processor to perform the computer-implemented method.

[0022] According to non-limiting embodiments or aspects, provided is a computer- implemented method. In some non-limiting embodiments or aspects, the method may

5

SUBSTITUTE SHEET ( RULE 26) include (a) executing an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory including memory that is accessible by a plurality of processors and on data resources in a cache memory of the at least one processor. The shared memory may include a plurality of shared memory locations. Each shared memory location may include a shared memory address. The method may further include (b) retrieving a shared data resource of shared data from a shared memory address of the shared memory based on a shared data load instruction. The shared data load instruction may include the shared memory address. The method may further include (c) storing the shared data resource in the cache memory of the at least one processor to generate a cache data resource. The method may further include (d) performing an operation on the cache data resource in the cache memory of the at least one processor to generate an updated cache data resource in the cache memory of the at least one processor. The method may further include (e) detecting that the shared data resource at the shared memory address of the shared memory is modified. The method may further include, (f) in response to detecting that the shared data resource at the shared memory address of the shared memory is modified, repeating steps (b)-(d). The method may further include (g) updating the shared data resource at the shared memory address of the shared memory based on the updated cache data resource in the cache memory of the at least one processor.

[0023] In some non-limiting embodiments or aspects, detecting that the shared data resource at the shared memory address of the shared memory is modified may include a spin loop.

[0024] In some non-limiting embodiments or aspects, steps (e), (f), and (g) may include atomic operations.

[0025] According to non-limiting embodiments or aspects, provided is a system including at least one processor and at least one non-transitory computer-readable medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the computer-implemented method.

[0026] According to non-limiting embodiments or aspects, provided is a computer program product including at least one non-transitory computer-readable medium including program instructions that, when executed by at least one processor, cause the at least one processor to perform the computer-implemented method.

6

SUBSTITUTE SHEET ( RULE 26) [0027] Other non-limiting embodiments or aspects will be set forth in the following numbered clauses:

[0028] Clause 1 : A computer-implemented method, comprising: executing, with at least one processor, an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory comprising memory that is accessible by a plurality of processors and on data resources in a cache memory of the at least one processor, the shared memory comprising a plurality of shared memory locations, each shared memory location comprising a shared memory address; locking, with the at least one processor, a shared memory location of the shared memory based on a shared memory address; retrieving, with the at least one processor, a shared data resource from the shared memory location of the shared memory based on a shared data load instruction, the shared data load instruction comprising the shared memory address; storing, with the at least one processor, the shared data resource in the cache memory of the at least one processor to generate a cache data resource; performing, with the at least one processor, an operation on the cache data resource in the cache memory of the at least one processor to generate an updated cache data resource in the cache memory of the at least one processor; updating, with the at least one processor, the shared data resource at the shared memory location of the shared memory based on the updated cache data resource in the cache memory of the at least one processor; and unlocking, with the at least one processor, the shared memory location of the shared memory based on the shared memory address.

[0029] Clause 2: The computer-implemented method of clause 1 , wherein locking the shared memory location of the shared memory comprises locking the shared memory location using a semaphore.

[0030] Clause 3: The computer-implemented method of clauses 1 or 2, wherein a computing device comprises a plurality of processors, the plurality of processors comprising the at least one processor, each processor of the plurality of processors comprising a cache memory and configured to execute the independent-cache collaboration algorithm within the computing device.

[0031] Clause 4: The computer-implemented method of any of clauses 1-3, wherein each processor of the plurality of processors does not interact with other

7

SUBSTITUTE SHEET ( RULE 26) processors of the plurality of processors of the computing device to perform a hardware cache coherence protocol.

[0032] Clause 5: The computer-implemented method of any of clauses 1-4, wherein each processor of the plurality of processors is not capable of accessing the cache memory of each other processor of the plurality of processors, and wherein each cache memory of each processor of the plurality of processors functions independent of values in each other cache memory of each other processor of the plurality of processors.

[0033] Clause 6: The computer-implemented method of any of clauses 1-5, wherein the independent-cache collaboration algorithm is configured to cause each processor of the plurality of processors to retrieve a shared data resource from the shared memory.

[0034] Clause 7: The computer-implemented method of any of clauses 1-6, wherein updating the shared data resource at the shared memory address of the shared memory based on the updated cache data resource in the cache memory of the at least one processor is performed using a hardware synchronization primitive. [0035] Clause 8: The computer-implemented method of any of clauses 1-7, wherein the shared data load instruction comprises a modified load instruction.

[0036] Clause 9: The computer-implemented method of any of clauses 1-8, wherein retrieving the shared data resource from the shared memory is based on a load instruction of the at least one processor.

[0037] Clause 10: The computer-implemented method of any of clauses 1-9, wherein updating the shared data resource at the shared memory address of the shared memory comprises: storing the updated cache data resource at the shared memory address of the shared memory based on a modified hardware synchronization primitive.

[0038] Clause 11 : The computer-implemented method of any of clauses 1-10, wherein the modified hardware synchronization primitive comprises a same format as a hardware synchronization primitive.

[0039] Clause 12: The computer-implemented method of any of clauses 1-11 , wherein the shared data load instruction comprises a same format as a load instruction.

8

SUBSTITUTE SHEET ( RULE 26) [0040] Clause 13: A computer-implemented method, comprising: (a) executing, with at least one processor, an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory comprising memory that is accessible by a plurality of processors and on data resources in a cache memory of the at least one processor, the shared memory comprising a plurality of shared memory locations, each shared memory location comprising a shared memory address; (b) retrieving, with the at least one processor, a shared data resource of shared data from a shared memory address of the shared memory based on a shared data load instruction, the shared data load instruction comprising the shared memory address; (c) storing, with the at least one processor, the shared data resource in the cache memory of the at least one processor to generate a cache data resource; (d) performing, with the at least one processor, an operation on the cache data resource in the cache memory of the at least one processor to generate an updated cache data resource in the cache memory of the at least one processor; (e) detecting, with the at least one processor, that the shared data resource at the shared memory address of the shared memory is modified; (f) in response to detecting that the shared data resource at the shared memory address of the shared memory is modified, repeating steps (b)-(d); and (g) updating, with the at least one processor, the shared data resource at the shared memory address of the shared memory based on the updated cache data resource in the cache memory of the at least one processor.

[0041] Clause 14: The computer-implemented method of clause 13, wherein detecting that the shared data resource at the shared memory address of the shared memory is modified comprises a spin loop.

[0042] Clause 15: The computer-implemented method of clauses 13 or 14, wherein steps (e), (f), and (g) are atomic operations.

[0043] Clause 16: A system comprising at least one processor programmed or configured to: execute an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory comprising memory that is accessible by a plurality of processors and on data resources in a cache memory of the at least one processor, the shared memory comprising a plurality of shared memory locations, each shared memory location comprising a shared memory address; lock a shared memory location of the shared memory based on a shared memory address; retrieve a shared data resource from the shared memory location of

9

SUBSTITUTE SHEET ( RULE 26) the shared memory based on a shared data load instruction, the shared data load instruction comprising the shared memory address; store the shared data resource in the cache memory of the at least one processor to generate a cache data resource; perform an operation on the cache data resource in the cache memory of the at least one processor to generate an updated cache data resource in the cache memory of the at least one processor; update the shared data resource at the shared memory location of the shared memory based on the updated cache data resource in the cache memory of the at least one processor; and unlock the shared memory location of the shared memory based on the shared memory address.

[0044] Clause 17: The system of clause 16, wherein locking the shared memory location of the shared memory comprises locking the shared memory location using a semaphore.

[0045] Clause 18: The system of clauses 16 or 17, wherein a computing device comprises a plurality of processors, the plurality of processors comprising the at least one processor, each processor of the plurality of processors comprising a cache memory and configured to execute the independent-cache collaboration algorithm within the computing device.

[0046] Clause 19: The system of any of clauses 16-18, wherein each processor of the plurality of processors does not interact with other processors of the plurality of processors of the computing device to perform a hardware cache coherence protocol. [0047] Clause 20: The system of any of clauses 16-19, wherein each processor of the plurality of processors is not capable of accessing the cache memory of each other processor of the plurality of processors, and wherein each cache memory of each processor of the plurality of processors functions independent of values in each other cache memory of each other processor of the plurality of processors.

[0048] Clause 21 : The system of any of clauses 16-20, wherein the independentcache collaboration algorithm is configured to cause each processor of the plurality of processors to retrieve a shared data resource from the shared memory.

[0049] Clause 22: The system of any of clauses 16-21 , wherein updating the shared data resource at the shared memory address of the shared memory based on the updated cache data resource in the cache memory of the at least one processor is performed using a hardware synchronization primitive.

10

SUBSTITUTE SHEET ( RULE 26) [0050] Clause 23: The system of any of clauses 16-22, wherein the shared data load instruction comprises a modified load instruction.

[0051] Clause 24: The system of any of clauses 16-23, wherein retrieving the shared data resource from the shared memory is based on a load instruction of the at least one processor.

[0052] Clause 25: The system of any of clauses 16-24, wherein, when updating the shared data resource at the shared memory address of the shared memory, the at least one processor is programmed or configured to: store the updated cache data resource at the shared memory address of the shared memory based on a modified hardware synchronization primitive.

[0053] Clause 26: The system of any of clauses 16-25, wherein the modified hardware synchronization primitive comprises a same format as a hardware synchronization primitive.

[0054] Clause 27: The system of any of clauses 16-26, wherein the shared data load instruction comprises a same format as a load instruction.

[0055] Clause 28: A system comprising at least one processor programmed or configured to: (a) execute an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory comprising memory that is accessible by a plurality of processors and on data resources in a cache memory of the at least one processor, the shared memory comprising a plurality of shared memory locations, each shared memory location comprising a shared memory address; (b) retrieve a shared data resource of shared data from a shared memory address of the shared memory based on a shared data load instruction, the shared data load instruction comprising the shared memory address; (c) store the shared data resource in the cache memory of the at least one processor to generate a cache data resource; (d) perform an operation on the cache data resource in the cache memory of the at least one processor to generate an updated cache data resource in the cache memory of the at least one processor; (e) detect that the shared data resource at the shared memory address of the shared memory is modified; (f) in response to detecting that the shared data resource at the shared memory address of the shared memory is modified, repeat steps (b)-(d); and (g) update the shared data resource at the shared memory address of the shared memory based on the updated cache data resource in the cache memory of the at least one processor.

11

SUBSTITUTE SHEET ( RULE 26) [0056] Clause 29: The system of clause 28, wherein detecting that the shared data resource at the shared memory address of the shared memory is modified comprises a spin loop.

[0057] Clause 30: The system of clauses 28 or 29, wherein steps (e), (f), and (g) are atomic operations.

[0058] Clause 31 : A computer program product comprising at least one non- transitory computer-readable medium including program instructions that, when executed by at least one processor, cause the at least one processor to: execute an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory comprising memory that is accessible by a plurality of processors and on data resources in a cache memory of the at least one processor, the shared memory comprising a plurality of shared memory locations, each shared memory location comprising a shared memory address; lock a shared memory location of the shared memory based on a shared memory address; retrieve a shared data resource from the shared memory location of the shared memory based on a shared data load instruction, the shared data load instruction comprising the shared memory address; store the shared data resource in the cache memory of the at least one processor to generate a cache data resource; perform an operation on the cache data resource in the cache memory of the at least one processor to generate an updated cache data resource in the cache memory of the at least one processor; update the shared data resource at the shared memory location of the shared memory based on the updated cache data resource in the cache memory of the at least one processor; and unlock the shared memory location of the shared memory based on the shared memory address.

[0059] Clause 32: The computer program product of clause 31 , wherein locking the shared memory location of the shared memory comprises locking the shared memory location using a semaphore.

[0060] Clause 33: The computer program product of clauses 31 or 32, wherein a computing device comprises a plurality of processors, the plurality of processors comprising the at least one processor, each processor of the plurality of processors comprising a cache memory and configured to execute the independent-cache collaboration algorithm within the computing device.

12

SUBSTITUTE SHEET ( RULE 26) [0061] Clause 34: The computer program product of any of clauses 31-33, wherein each processor of the plurality of processors does not interact with other processors of the plurality of processors of the computing device to perform a hardware cache coherence protocol.

[0062] Clause 35: The computer program product of any of clauses 31-34, wherein each processor of the plurality of processors is not capable of accessing the cache memory of each other processor of the plurality of processors, and wherein each cache memory of each processor of the plurality of processors functions independent of values in each other cache memory of each other processor of the plurality of processors.

[0063] Clause 36: The computer program product of any of clauses 31-35, wherein the independent-cache collaboration algorithm is configured to cause each processor of the plurality of processors to retrieve a shared data resource from the shared memory.

[0064] Clause 37: The computer program product of any of clauses 31-36, wherein updating the shared data resource at the shared memory address of the shared memory based on the updated cache data resource in the cache memory of the at least one processor is performed using a hardware synchronization primitive.

[0065] Clause 38: The computer program product of any of clauses 31-37, wherein the shared data load instruction comprises a modified load instruction.

[0066] Clause 39: The computer program product of any of clauses 31-38, wherein retrieving the shared data resource from the shared memory is based on a load instruction of the at least one processor.

[0067] Clause 40: The computer program product of any of clauses 31-39, wherein the one or more instructions that cause the at least one processor to update the shared data resource at the shared memory address of the shared memory cause the at least one processor to: store the updated cache data resource at the shared memory address of the shared memory based on a modified hardware synchronization primitive.

[0068] Clause 41 : The computer program product of any of clauses 31-40, wherein the modified hardware synchronization primitive comprises a same format as a hardware synchronization primitive.

13

SUBSTITUTE SHEET ( RULE 26) [0069] Clause 42: The computer program product of any of clauses 31-41 , wherein the shared data load instruction comprises a same format as a load instruction.

[0070] Clause 43: A computer program product comprising at least one non- transitory computer-readable medium including program instructions that, when executed by at least one processor, cause the at least one processor to: (a) execute an independent-cache collaboration algorithm configured to perform operations on data resources in shared memory comprising memory that is accessible by a plurality of processors and on data resources in a cache memory of the at least one processor, the shared memory comprising a plurality of shared memory locations, each shared memory location comprising a shared memory address; (b) retrieve a shared data resource of shared data from a shared memory address of the shared memory based on a shared data load instruction, the shared data load instruction comprising the shared memory address; (c) store the shared data resource in the cache memory of the at least one processor to generate a cache data resource; (d) perform an operation on the cache data resource in the cache memory of the at least one processor to generate an updated cache data resource in the cache memory of the at least one processor; (e) detect that the shared data resource at the shared memory address of the shared memory is modified; (f) in response to detecting that the shared data resource at the shared memory address of the shared memory is modified, repeat steps (b)-(d); and (g) update the shared data resource at the shared memory address of the shared memory based on the updated cache data resource in the cache memory of the at least one processor.

[0071] Clause 44: The computer program product of clause 43, wherein detecting that the shared data resource at the shared memory address of the shared memory is updated comprises a spin loop.

[0072] Clause 45: The computer program product of clauses 43 or 44, wherein steps (e), (f), and (g) are atomic operations.

[0073] These and other features and characteristics of the present disclosure, as well as the methods of operation and functions of the related elements of structures and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures.

14

SUBSTITUTE SHEET ( RULE 26) It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the disclosed subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

[0074] Additional advantages and details are explained in greater detail below with reference to the non-limiting embodiments that are illustrated in the accompanying schematic figures, in which:

[0075] FIG. 1 is a schematic diagram of a system for maintaining coherent and/or consistent cache memory with independent processor caches in a scalable multiprocessor system absent a hardware cache coherence protocol according to some non-limiting embodiments or aspects;

[0076] FIG. 2 is a flow diagram of a process for maintaining coherent and/or consistent cache memory with independent processor caches in a scalable multiprocessor system absent a hardware cache coherence protocol according to some non-limiting embodiments or aspects;

[0077] FIG. 3 is a flow diagram of another process for maintaining coherent and/or consistent cache memory with independent processor caches in a scalable multiprocessor system absent a hardware cache coherence protocol according to some non-limiting embodiments or aspects;

[0078] FIG. 4 is a schematic diagram of a system for maintaining coherent and/or consistent cache memory with independent processor caches in a scalable multiprocessor system absent a hardware cache coherence protocol according to some non-limiting embodiments or aspects;

[0079] FIG. 5 is a diagram of an exemplary environment in which methods, systems, and/or computer program products, described herein, may be implemented according to some non-limiting embodiments or aspects; and

[0080] FIG. 6 is a schematic diagram of example components of one or more devices of FIG. 1 and/or FIG. 5 according to some non-limiting embodiments or aspects.

15

SUBSTITUTE SHEET ( RULE 26) DETAILED DESCRIPTION

[0081] For purposes of the description hereinafter, the terms “end,” “upper,” “lower,” “right,” “left,” “vertical,” “horizontal,” “top,” “bottom,” “lateral,” “longitudinal,” and derivatives thereof shall relate to the embodiments as they are oriented in the drawing figures. However, it is to be understood that the embodiments may assume various alternative variations and step sequences, except where expressly specified to the contrary. It is also to be understood that the specific devices and processes illustrated in the attached drawings, and described in the following specification, are simply exemplary embodiments or aspects of the disclosed subject matter. Hence, specific dimensions and other physical characteristics related to the embodiments or aspects disclosed herein are not to be considered as limiting.

[0082] No aspect, component, element, structure, act, step, function, instruction, and/or the like used herein should be construed as critical or essential unless explicitly described as such. Also, as used herein, the articles “a” and “an” are intended to include one or more items and may be used interchangeably with “one or more” and “at least one.” Furthermore, as used herein, the term “set” is intended to include one or more items (e.g., related items, unrelated items, a combination of related and unrelated items, and/or the like) and may be used interchangeably with “one or more” or “at least one.” Where only one item is intended, the term “one” or similar language is used. Also, as used herein, the terms “has,” “have,” “having,” or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based at least partially on” unless explicitly stated otherwise.

[0083] As used herein, the term “communication” may refer to the reception, receipt, transmission, transfer, provision, and/or the like of data (e.g., information, signals, messages, instructions, commands, and/or the like). For one unit (e.g., a device, a system, a component of a device or system, combinations thereof, and/or the like) to be in communication with another unit means that the one unit is able to directly or indirectly receive information from and/or transmit information to the other unit. This may refer to a direct or indirect connection (e.g., a direct communication connection, an indirect communication connection, and/or the like) that is wired and/or wireless in nature. Additionally, two units may be in communication with each other even though the information transmitted may be modified, processed, relayed, and/or

16

SUBSTITUTE SHEET ( RULE 26) routed between the first and second unit. For example, a first unit may be in communication with a second unit even though the first unit passively receives information and does not actively transmit information to the second unit. As another example, a first unit may be in communication with a second unit if at least one intermediary unit processes information received from the first unit and communicates the processed information to the second unit.

[0084] As used herein, the term “computing device” may refer to one or more electronic devices configured to process data. A computing device may, in some examples, include the necessary components to receive, process, and output data, such as a processor, a display, a memory, an input device, a network interface, and/or the like. A computing device may be a mobile device. As an example, a mobile device may include a cellular phone (e.g., a smartphone or standard cellular phone), a portable computer, a wearable device (e.g., watches, glasses, lenses, clothing, and/or the like), a personal digital assistant (PDA), and/or other like devices. A computing device may also be a desktop computer or other form of non-mobile computer.

[0085] As used herein, the terms “client” and “client device” may refer to one or more client-side devices or systems used to initiate or facilitate a network connection. As an example, a “client device” may refer to one or more computing devices used by a user, one or more personal computers used by a user, one or more mobile devices used by a user, and/or the like. In some non-limiting embodiments or aspects, a client device may be an electronic device configured to communicate with one or more networks. For example, a client device may include one or more computers, portable computers, laptop computers, tablet computers, mobile devices, cellular phones, wearable devices (e.g., watches, glasses, lenses, clothing, and/or the like), PDAs, and/or the like. Moreover, a “client” may also refer to an entity (e.g., a user, a corporation, and/or the like) that owns, utilizes, and/or operates a client device.

[0086] As used herein, the term “server” may refer to or include one or more computing devices that are operated by or facilitate communication and processing for multiple parties (e.g., clients, client devices, users, and/or the like) in a network environment, such as the Internet, although it will be appreciated that communication may be facilitated over one or more public or private network environments and that various other arrangements are possible. Further, multiple computing devices (e.g., servers, mobile devices, etc.) directly or indirectly communicating in the network 17

SUBSTITUTE SHEET ( RULE 26) environment may constitute a “system.” Reference to “a server” or “a processor,” as used herein, may refer to a previously-recited server and/or processor that is recited as performing a previous step or function, a different server and/or processor, and/or a combination of servers and/or processors. For example, as used in the specification and the claims, a first server and/or a first processor that is recited as performing a first step or function may refer to the same or different server and/or a processor recited as performing a second step or function.

[0087] Non-limiting embodiments or aspects of the disclosed subject matter are directed to systems, methods, and computer program products for maintaining coherent and/or consistent cache memory with independent processor caches in a scalable multiprocessor system absent a hardware cache coherence protocol. For example, non-limiting embodiments or aspects of the disclosed subject matter provide methods, systems, and computer program products for executing an independentcache collaboration algorithm (e.g., an algorithm that is included in and/or is part of a software program and/or application). The independent-cache collaboration algorithm may be configured to perform operations on data resources. Data resources may reside in shared memory (e.g., a shared memory location) and/or in a cache memory of at least one processor. The shared memory may include memory that is accessible by a plurality of processors and the shared memory may include a plurality of shared memory addresses. Each shared memory address of the shared memory may correspond to a shared memory location. A shared memory address of the shared memory may be locked. A shared data resource of shared data may be retrieved from the shared memory address of the shared memory based on a shared data load instruction. The shared data load instruction may include the shared memory address of a shared memory location of shared memory. The shared data resource of the shared data may be stored (e.g., copied from the shared memory address and stored) in the cache memory of at least one processor to generate a cache data resource. An operation may be performed on the cache data resource in the cache memory of at least one processor to generate an updated cache data resource in the cache memory of at least one processor. The shared data resource (e.g., the value of the shared data resource) of the shared data at the shared memory address of the shared memory may be updated based on the updated cache data resource in the cache memory of at least one processor. The shared memory address of the shared memory 18

SUBSTITUTE SHEET ( RULE 26) location may be unlocked (e.g., after the shared data resource at the shared memory address is updated based on the updated cache data resource).

[0088] Non-limiting embodiments or aspects provide techniques and systems that facilitate an independent-cache collaboration algorithm (e.g., a software application, software-based algorithm, and/or the like) that may be independent of an initial configuration of a plurality of processors in a parallel computer system such that the parallel computer system may be scalable. For example, non-limiting embodiments or aspects of the systems, methods, and computer program products described herein may not implement a hardware cache coherence protocol and may function with multiple configurations of processors where each processor is configured to execute the independent-cache collaboration algorithm (e.g., an instance of the independentcache collaboration algorithm). In this way, each processor of a system may execute an instance of the independent-cache collaboration algorithm, and each processor may include an independent cache memory (e.g., each processor may include a dedicated cache memory for the processor, and the dedicated cache memory may not communicate with any other processors or any other cache memories in the system). Because the system does not implement a hardware cache coherence protocol, the system does not experience any cache misses (e.g., a request to retrieve data from a cache does not fail) and any data resource that is read from cache memory by a processor can be immediately written to shared memory without any hardware cache coherent protocol rules, such as checking the value of the same data resource in other cache memories, or changing the state of a data resource and/or a cache memory block (e.g., a portion of memory in a cache memory associated with a cache memory address, the portion of memory configured to store a data resource).

[0089] Additionally, non-limiting embodiments or aspects of the systems, methods, and products described herein may scale (e.g., expand a configuration such that the configuration may handle additional computing loads and/or processing) with the addition of more processors without requiring additional configuration and/or changes to the hardware and/or computing devices executing the independent-cache collaboration algorithm. For example, additional processors, each additional processor associated with a cache memory (e.g., an independent cache memory), may be added to the system, with each additional processor executing an instance of the independent-cache collaboration algorithm. Including additional processors may 19

SUBSTITUTE SHEET ( RULE 26) not require additional configuration and/or changes to the system and may allow for faster performance than other existing architectures.

[0090] Further, adding additional processors does not result in additional computational overhead as a total processing time would no longer be dependent on an amount of load (e.g., task load) since tasks (e.g., computational tasks, jobs, and/or the like) no longer need to wait in a queue and/or get swapped out by separate processors. Non-limiting embodiments or aspects may provide a greater number of processors in the system than the number of tasks to be performed simultaneously, thus eliminating the need for processors to multitask. The design of non-limiting embodiments or aspects allows each processor to focus on a specific task, and each processor may wait for shared data resources to become available if a shared data resource is needed to complete a task. This way, each processor does not need to process a queue of tasks, swapping tasks in and out, and sharing cache data with other processors and cache memories. For example, each task in the system may execute as a stand-alone task on a single processor, and no processors need to multitask. With each processor executing a single task, tasks that would normally execute in the background (e.g., a background task) may be executed on a dedicated processor (e.g., exactly one task per processor) and the task may execute continuously without being subject to a queue or prioritization of tasks for the dedicated processor. A result of each processor executing a single task and each processor waiting for shared data resources is that each processor does not need to multitask, and each processor consumes a negligible amount of a total utilization of a processor while waiting for shared data resources.

[0091] In this way, the performance of non-limiting embodiments or aspects of the systems, methods, and/or computer program products described herein may not be negatively impacted by the addition of processors and the performance of the systems, methods, and/or computer program products may be improved through scaling with the implementation of the independent-cache collaboration algorithm. Each processor of the system executing an instance of the independent-cache collaboration algorithm may collaborate with each other processor of the system such that each processor operates on shared data resources (e.g., each processor shares data stored in shared memory with each other processor in the system) without communicating directly with other processors or other cache memories in the system. Additionally, each processor 20

SUBSTITUTE SHEET ( RULE 26) is associated with an independent cache memory on which only the processor associated with the independent cache memory may operate (e.g., each processor does not collaborate with other cache memories of other processors and each cache memory is independent of each other cache memory in the system; each processor is associated with its own cache memory). In this way, each cache memory is associated with at least one processor and each cache memory is unaffected by other processors and/or each cache memory is unaffected by modifications made to other cache memories in the system. Each processor only has access to its own independent cache memory and each processor does not have access to each other independent cache memory associated with each other processor in the system. Thus, non-limiting embodiments or aspects reduce the resources required to perform configuration and/or testing when additional processors are added to improve performance.

[0092] Additionally, such non-limiting embodiments or aspects of the systems, methods, and products described herein may involve each processor of a plurality of processors executing an instance of the independent-cache collaboration algorithm. In this way, non-limiting embodiments or aspects may allow each processor of a plurality of processors (e.g., in a multiprocessor system) to maintain coherent and/or consistent data resources in cache memory independent of the other processors of the plurality of processors to reduce errors in calculations and/or processing. Each processor may run an instance (e.g., an independent instance) of the independentcache collaboration algorithm, thus maintaining independent cache memory to execute the independent-cache collaboration algorithm where the independent cache memory is not shared with other processors in a system which may be configured to execute separate instances of the independent-cache collaboration algorithm. In this way, additional processors may be added to improve performance of the system without impacting the performance of the existing processors in the system. Further, each processor in the system may maintain algorithmic independence from each other processor when additional processors are added. Thus, direct communication (e.g., communication without using at least one intermediary, such as shared memory) between different processors, direct communication between different caches, and direct communication between processors and caches associated with other processors is eliminated, and the requirement to keep data coherent and/or consistent between multiple caches is eliminated, allowing for a single location where data is

21

SUBSTITUTE SHEET ( RULE 26) coherent and/or consistent and where data is read and written by each processor in the system: the shared memory.

[0093] FIG. 1 depicts a system 100 for maintaining coherent and/or consistent cache memory with independent processor caches in a scalable multiprocessor system absent a hardware cache coherence protocol according to some non-limiting embodiments or aspects. System 100 may include independent-cache collaboration system 102, processor 104-1 to processor 104-n (e.g., a plurality of processors, referred to individually as processor 104 and collectively as processors 104 where appropriate), cache memory 106-1 to cache memory 106-n (e.g., a plurality of cache memories, referred to individually as cache memory 106 and collectively as cache memories 106 where appropriate), and shared memory 108.

[0094] Independent-cache collaboration system 102 may include a computing device, such as a server (e.g., a single server), a group of servers, and/or other like devices. In some non-limiting embodiments or aspects, independent-cache collaboration system 102 may include a processor and/or memory as described herein. In some non-limiting embodiments or aspects, independent-cache collaboration system 102 may include one or more client devices. In some non-limiting embodiments or aspects, independent-cache collaboration system 102 may include one or more servers and/or one or more client devices executing instructions (e.g., software instructions) that cause independent-cache collaboration system 102 to perform one or more steps of methods as described herein.

[0095] Processor 104 (e.g., processor 104-1 to processor 104-n) may include at least one processor (e.g., a multi-core processor), such as a central processing unit (CPU), an accelerated processing unit (APU), a graphics processing unit (GPU), a microprocessor, and/or the like. In some non-limiting embodiments or aspects, processor 104 may include at least one processor having a single core (e.g., a singlecore processor) or at least one processor having multiple cores (e.g., a multiprocessor, a multi-core processor, a processor including more than one core, and/or the like). In some non-limiting embodiments or aspects, processor 104 may include at least one core (e.g., a core of a processor associated with a dedicated L1 cache) that is a component of (e.g., part of) a single-core processor or a multiprocessor.

[0096] In some non-limiting embodiments or aspects, processor 104 (e.g., processors 104-1 to 104-n) may be programmed to perform one or more steps of 22

SUBSTITUTE SHEET ( RULE 26) methods described herein. In some non-limiting embodiments or aspects, processors 104 may include one or more processors executing instructions (e.g., software instructions) that cause processors 104 to perform one or more steps of methods as described herein. In some non-limiting embodiments or aspects, processor 104 may be in communication with cache memory 106 (e.g., cache memories 106-1 to 106-n). In some non-limiting embodiments or aspects, processor 104 may be capable of receiving information (e.g., data, data resources, and/or the like) from and/or communicating (e.g., transmitting) information to cache memory 106. In some nonlimiting embodiments or aspects, processor 104 may execute an instance of an independent-cache collaboration algorithm (e.g., an instance of an algorithm configured to keep information and/or data resources stored in cache memory 106 coherent and/or consistent with information and/or data resources stored in shared memory 108 for processing by processor 104 and/or the like).

[0097] In some non-limiting embodiments or aspects, at least one processor 104 may execute an operating system where the operating system is a multitasking operating system which may require a hardware cache coherence protocol, but the operating system may be modified to execute the independent-cache collaboration algorithm. For example, at least one processor 104 may execute the operating system where the operating system is modified such that the operating system does not require a hardware cache coherence protocol and the operating system is modified to execute the independent-cache collaboration algorithm (e.g., a modified multitasking operating system). In some non-limiting embodiments or aspects, at least one processor 104 may execute an operating system where the operating system is a nonmultitasking atomic operating system which executes tasks from a shared memory queue (e.g., a queue of one or more tasks stored in shared memory 108). In some non-limiting embodiments or aspects, at least one processor 104 may execute either a modified multitasking operating system or a non-multitasking atomic operating system that may execute tasks from a shared memory queue. In some non-limiting embodiments or aspects, at least one processor 104 may add tasks to a shared memory queue, independent of whether processor 104 is executing a modified multitasking operating system or a non-multitasking atomic operating system.

[0098] In some non-limiting embodiments or aspects, the modified multitasking operating system or the non-multitasking atomic operating system may reside partially 23

SUBSTITUTE SHEET ( RULE 26) or wholly in a common static cache memory resource. In some non-limiting embodiments or aspects, the modified multitasking operating system and the nonmultitasking atomic operating system may provide access to a common static cache memory resource while the independent-cache collaboration algorithm may provide access to independent cache memory addresses and shared memory addresses.

[0099] In some non-limiting embodiments or aspects, at least one processor 104 may continue execution while the at least one processor 104 requires a result (e.g., waits for a result that may be required for further execution) from a task that the at least one processor 104 added to a shared memory queue. To continue execution while the at least one processor 104 requires a result from the task, the at least one processor 104 may wait until the task added to the shared memory queue has a result (e.g., a result associated with the task added to the shared memory queue) stored in a shared memory location of shared memory 108. In this manner, the at least one processor 104 and tasks the at least one processor 104 has initiated may run and/or be executed concurrently. In some non-limiting embodiments or aspects, processor 104 may not execute any operating systems that require multitasking. In this way, multitasking of conventional computing systems may be replaced by concurrent processing.

[0100] In some non-limiting embodiments or aspects, each processor 104 of a plurality of processors 104 may not be configured to interact with other processors 104 of the plurality of processors 104 to perform a hardware cache coherence protocol. For example, each processor 104 of the plurality of processors 104 may execute an independent-cache collaboration algorithm where the independent-cache collaboration algorithm may be configured to cause each processor 104 to interact with cache memory 106 of each processor respectively (e.g., processor 104-1 may interact with cache memory 106-1 , processor 104-2 may interact with cache memory 106-2, etc.). Each processor 104 of the plurality of processors 104 may execute the independent-cache collaboration algorithm where the independent-cache collaboration algorithm may be configured to cause each processor 104 to interact with shared memory 108. Each processor 104 of the plurality of processors 104 may execute the independent-cache collaboration algorithm where the independent-cache collaboration algorithm may be configured to cause each processor 104 to not interact (e.g., be unable to interact, be unable to access, and/or the like) with cache memory 24

SUBSTITUTE SHEET ( RULE 26) 106 of each other processor (e.g., processor 104-1 is not capable of interacting with cache memory 106-2, 106-3, etc., and processor 104-2 is not capable of interacting with cache memory 106-1 , 106-3, etc.).

[0101] In some non-limiting embodiments or aspects, the independent-cache collaboration algorithm executing on each processor 104 may include a software application and/or software program that may be reentrant. For example, a first instance of the independent-cache collaboration algorithm may be interrupted while executing on processor 104-1 and the independent-cache collaboration algorithm (e.g., the software program of the independent-cache collaboration algorithm) may be subsequently called (e.g., commanded to execute by a processor) to initiate a second instance of the independent-cache collaboration algorithm executing on processor 104-1 before the first instance of the independent-cache collaboration algorithm completes execution).

[0102] Cache memory 106 (e.g., cache memories 106-1 to 106-n) may include cache memory internal to and/or associated with a processor (e.g., processors 104-1 to 104-n). In some non-limiting embodiments or aspects, cache memory 106 may include a level 1 and/or primary cache, a level 2 and/or secondary cache, and/or a level 3 and/or tertiary cache. In some non-limiting embodiments or aspects, cache memory 106 may include a storage component (e.g., a volatile storage component) that stores information and/or instructions for use by processor 104. In some nonlimiting embodiments or aspects, cache memory 106 may include CPU memory of processor 104. In some non-limiting embodiments or aspects, cache memory 106 may include random access memory (RAM) and/or another type of static storage device.

[0103] In some non-limiting embodiments or aspects, cache memory 106 may store information and/or software related to the operation and use of independentcache collaboration system 102 and/or processor 104. For example, cache memory 106 may include a type of computer-readable medium. In some non-limiting embodiments or aspects, each cache memory 106 may transmit information to and/or receive information from each processor 104 respectively (e.g., cache memory 106-1 may transmit information to and/or receive information from processor 104-1 , cache memory 106-2 may transmit information to and/or receive information from processor 104-2, etc.). In some non-limiting embodiments or aspects, each cache memory 106 25

SUBSTITUTE SHEET ( RULE 26) may be implemented by (e.g., part of) each processor 104 respectively (e.g., cache memory 106-1 may be part of processor 104-1 , cache memory 106-2 may be part of processor 104-2, etc.). In some non-limiting embodiments or aspects, cache memory 106 may be referred to as independent-cache memory, as each cache memory 106 may only be accessible by processor 104 associated with cache memory 106 (e.g., cache memory 106-1 is an independent cache memory to processor 104-1 as it is only accessible by processor 104-1 , cache memory 106-2 is an independent cache memory to processor 104-2 as it is only accessible by processor 104-2, etc.).

[0104] In some non-limiting embodiments or aspects, cache memory 106 may be an independent cache due to hardware constraints (e.g., one hardware-based cache memory associated with each processor, other processors are not physically connected to the hardware-based cache memory and cannot communicate with the hardware-based cache memory). Additionally or alternatively, cache memory 106 may be an independent cache due to logical constraints (e.g., a hardware-based pool of cache memory may be logically divided into different cache memory resources, each cache memory resource only accessible by one processor). In some non-limiting embodiments, cache memories 106 (e.g., a plurality of cache memories 106) may include a number of cache memories 106 including hardware constraints, a number of cache memories 106 including logical constraints, or a combination of both hardware constraints and logical constraints.

[0105] Shared memory 108 may include RAM, read only memory (ROM), and/or another type of dynamic or static storage device (e.g., flash memory, magnetic memory, optical memory, etc.) that stores information and/or instructions for use by independent-cache collaboration system 102 and/or processor 104. In some nonlimiting embodiments or aspects, shared memory 108 may include a computer readable medium that may store information and/or software related to the operation and use of independent-cache collaboration system 102 and/or processor 104. For example, shared memory 108 may include a hard disk (e.g., a magnetic disk, an optical disk, a magneto-optic disk, a solid-state disk, etc.) and/or another type of computer-readable medium. In some non-limiting embodiments or aspects, shared memory 108 may include memory that is mapped to one or more processes (e.g., processors executing software instructions, software applications, and/or the like) such that the one or more processors may read and/or write data to the memory 26

SUBSTITUTE SHEET ( RULE 26) without interacting with an operating system of a computing device (e.g., without calling functions of the operating system). In some non-limiting embodiments or aspects, shared memory 108 may transmit information to and/or receive information from processor 104. In some non-limiting embodiments or aspects, shared memory 108 may include a plurality of shared memory addresses. Each shared memory address may include an identifier of a shared memory location (e.g., a shared memory block) in shared memory 108. In some non-limiting embodiments or aspects, system 100 may include only one instance (e.g., one copy) of shared memory 108. Alternatively, system 100 may include a plurality of instances of shared memory 108, where each instance of shared memory 108 is identical (e.g., multiple copies of shared memory 108 where each instance is a mirrored copy of shared memory 108). Shared memory 108 may not include a cache memory (e.g., a separate and/or integrated cache memory) associated with shared memory 108. In some non-limiting embodiments or aspects, shared memory 108 without a cache memory associated with shared memory 108 may be referred to as cacheless shared memory.

[0106] In some non-limiting embodiments or aspects, shared memory may refer to memory of a computing device that is accessible by a plurality of processors. Shared memory may be accessible by the plurality of processors simultaneously. In some non-limiting embodiments or aspects, shared memory may include a physical memory device (e.g., RAM) that may be accessible by the plurality of processors. The physical memory device may include a plurality of memory locations and each memory location may include and/or correspond to a physical memory address of a plurality of physical memory addresses. In some non-limiting embodiments or aspects, shared memory may include virtual memory (e.g., virtual memory configured in software instructions, and/or the like). Virtual memory may include a plurality of virtual memory locations. Each virtual memory location may include and/or correspond to a virtual memory address of a plurality of virtual memory addresses. Each virtual memory address may be associated with (e.g., may be mapped to) a physical memory address of the plurality of physical memory addresses. In this way, shared memory may include virtual memory which corresponds to a plurality of physical memory addresses for a plurality of physical memory locations where the plurality of physical memory locations span across one or more physical memory devices (e.g., RAM, disk storage, and/or the like).

27

SUBSTITUTE SHEET ( RULE 26) [0107] In non-limiting embodiments or aspects, there are a plurality of locations in system 100 where an updated data resource (e.g., the most current and/or up-to-date value of a data resource) may be located. The updated data resource may be located in (1 ) shared memory (e.g., shared memory 108), in (2) a first processor’s cache memory (e.g., cache memory 106-1 of processor 104-1 ), or in (3) another processor’s cache memory (e.g., cache memory 106-2 through cache memory 106-n of processor 104-2 through 104-n). Shared memory 108 may consist of memory resources (e.g., shared memory resources) that may be modified by an atomic hardware synchronization primitive (HSP) and/or memory resources that may be locked with a semaphore. Cache memory resources may consist of memory resources (e.g., cache memory resources) that may be independent cache memory resources and/or memory resources that may be common static memory resources. In existing systems, a hardware cache coherence protocol may be used to ensure that data resources stored in shared memory and/or stored in each cache memory associated with each processor remain updated.

[0108] As used herein, a protocol may refer to a set of rules that allows a plurality of connected devices to communicate. Non-limiting embodiments or aspects eliminate the hardware cache coherence protocol by using the independent-cache collaboration algorithm to enforce an independent cache protocol (“ICP”) and to provide three functions that a cache coherence protocol should accomplish. Thus, the ICP requires that a processor in the system (e.g., each processor of a multiprocessor system) ensures that data resources residing in each of the following locations remain coherent and/or consistent: shared memory (ICP Rule 1 ), the cache memory associated with the processor (ICP Rule 2), and all other cache memories associated with each other processor (ICP Rule 3).

[0109] In non-limiting embodiments or aspects, since each processor should have access to an updated data resource, the updated data resource should reside in shared memory. Storing one copy of the updated data resource in shared memory and not storing the updated data resource in any cache memory associated with shared memory results in shared memory being cacheless shared memory. Thus, shared memory (e.g., cacheless shared memory) has the most current and/or up-to- date value of the updated data resource, thus fulfilling ICP Rule 1 requiring coherent and/or consistent shared memory.

28

SUBSTITUTE SHEET ( RULE 26) [0110] In contrast to non-limiting embodiments or aspects disclosed herein, an existing architecture may have an updated data resource (e.g., multiple copies of an updated data resource) stored in both shared memory and in multiple cache memories associated with multiple processors of the existing architecture resulting in multiple copies of the updated data resource. The existence of multiple copies of the updated data resource is a reason the existing architecture requires a hardware cache coherence protocol. The hardware cache coherence protocol assists in identifying the current and/or most up-to-date copy (e.g., the most current value) of the updated data resource within the multiple copies of the updated data resource. In the existing architecture, by virtue of parallel processing, the multiple copies of the updated data resource may each have different values assigned to each copy of the updated data resource.

[0111] In non-limiting embodiments or aspects, ICP Rule 2 is fulfilled as each cache memory of each processor remains coherent and/or consistent because each processor is associated with an independent cache memory and each independent cache memory is accessed by only one processor (e.g., the processor associated with the cache memory). Since each processor manages its own cache memory (e.g., the independent cache memory associated with the processor), no hardware cache coherence protocol and/or hardware implementation is needed to supervise and/or coordinate the cache memories (e.g., a plurality of independent cache memories, each independent cache memory associated with a processor of a plurality of processors). [0112] In non-limiting embodiments or aspects as described herein, since shared memory may remain coherent and/or consistent and each independent cache memory associated with each processor remains coherent and/or consistent, then ICP Rule 3 is not required if each cache memory associated with each processor does not communicate with any other cache memory associated with each other processor. Thus, ICP Rule 3 is fulfilled by virtue of ICP Rule 1 and ICP Rule 2 being fulfilled and by eliminating communication of each cache memory with each other cache memory and with other processors within the system. ICP Rule 1 is fulfilled by non-limiting embodiments or aspects because the system stores values of data resources stored in shared memory in only one location (e.g., only in shared memory) and ensures that shared memory is current and/or up-to-date. ICP Rule 2 is fulfilled by non-limiting embodiments or aspects because each cache memory associated with each

29

SUBSTITUTE SHEET ( RULE 26) processor is independent of other cache memories and other processors within the system. Thus, a hardware cache coherence protocol is not required in non-limiting embodiments or aspects. Therefore, independent cache memories are possible with non-limiting embodiments or aspects and the independent-cache collaboration algorithm executed by each processor (e.g., each processor 104) in non-limiting embodiments or aspects as described herein.

[0113] Non-limiting embodiments or aspects may provide a novel instruction set that may allow a software application (e.g., the independent-cache collaboration algorithm) to specify a source and/or destination of an input and/or an output. For example, the instruction set may include an input source (e.g., cache memory 106 and/or shared memory 108) and an output destination (e.g., cache memory 106 and/or shared memory 108). Non-limiting embodiments or aspects may include instructions representing input-output combinations that may be specified by the software application such as: cache memory input source to cache memory output destination (e.g., a source output-cache cache (SO-CC) instruction), cache memory input source to shared memory output destination (e.g., a source output-cache shared (SO-CS) instruction), shared memory input source to cache memory output destination (e.g., a source output-shared cache (SO-SC) instruction), and shared memory input source to shared memory output destination (e.g., a source output-shared shared (SO-SS) instruction).

[0114] As an example, processor 104 may execute an instruction specified as SO- SS. Processor 104 may lock a shared memory address (e.g., a shared memory block) containing a shared data resource using a semaphore. The shared data resource may be used for executing the instruction and the shared data resource may be stored in shared memory 108 and/or cache memory 106. If the instruction is not specified as SO-SS, the instruction may be executed as SO-CC. Processor 104 may read a cache data resource from cache memory 106 such that the cache data resource is used for executing the instruction and is subsequently stored in cache memory 106. If processor 104 cannot read the cache data resource from cache memory 106 (e.g., the cache data resource does not exist in cache memory 106), then processor 104 may read a shared data resource to use for executing the instruction and processor 104 may store the shared data resource in cache memory 106 as a cache data resource. In this way, processor 104 (e.g., processor 104-1 ) may execute a SO-SS instruction 30

SUBSTITUTE SHEET ( RULE 26) while other processors 104 (e.g., processor 104-2, processor 104-3, etc.) may execute different instructions.

[0115] In some non-limiting embodiments or aspects, processor 104 may execute an instruction that causes processor 104 to generate a pointer (e.g., a shared pointer) associated with a shared memory block and causes processor 104 to store the pointer in cache memory 106. In this way, any operation and/or instruction of the independentcache collaboration algorithm that references the pointer may read the shared memory address (e.g., the shared memory block) that is associated with the pointer and results of the operation and/or instruction may be stored in shared memory 108 and/or cache memory 106.

[0116] In some non-limiting embodiments or aspects, processor 104 may execute an instruction that reads (e.g., retrieves) a data resource from shared memory 108 and writes (e.g., stores) the data resource to shared memory 108 and/or cache memory 106. In some non-limiting embodiments, processor 104 may execute an instruction that reads a data resource from cache memory 106 and writes the data resource to shared memory 108 and/or cache memory 106. For example, processor 104 (e.g., processor 104-1 ) may execute an instruction (e.g., a SO-CS instruction) that causes processor 104 to read a data resource from cache memory 106 associated with processor 104 (e.g., cache memory 106-1 ) and causes processor 104 to write the data resource to shared memory 108. In some non-limiting embodiments or aspects, processor 104 may execute an instruction that causes processor 104 to read a plurality of data resources from one or more input sources and causes processor 104 to write the plurality of data resources to one or more output destinations.

[0117] In some non-limiting embodiments or aspects, input source may refer to one input source or a prioritized list of input sources. For example, if a prioritized list of input sources includes cache memory 106 as a first prioritized input source and the prioritized list of input sources includes shared memory 108 as a second prioritized input source, processor 104 may attempt to retrieve an input from cache memory 106 as an input source first, and if no input exists in cache memory 106 or if cache memory 106 is not present, then processor 104 may default to shared memory 108 as the input source to retrieve the input. In some non-limiting embodiments or aspects, output destination may refer to one output destination, a prioritized list of output destinations, and/or more than one output destination.

31

SUBSTITUTE SHEET ( RULE 26) [0118] In some non-limiting embodiments or aspects, the instruction set may allow a software application (e.g., independent cache collaboration algorithm executing on processor 104) to specify an input source and/or an output destination dynamically (e.g., during execution of the independent cache collaboration algorithm). For example, the independent-cache collaboration algorithm may specify an input source of shared memory 108 dynamically by causing processor 104 to execute the instruction set by determining a location of an input in real time based on the input and/or based on other input data to processor 104. In this way, independent cache collaboration system 102 (e.g., processor 104 thereof) may retrieve inputs from an input source and direct outputs to an output destination that may change based on the execution of a task and/or a location of an up-to-date input during execution. Independent-cache collaboration system 102 may dynamically change input sources and output destinations based on performance of a task or based on the resources required to retrieve an input or to transmit an output. For example, independent-cache collaboration system 102 having a plurality of processors 104 may assign a separate input source to each processor 104, and independent-cache collaboration system may dynamically change the input source of each processor 104 during execution of the independent-cache collaboration algorithm based on performance and/or wait times of each processor 104 such that retrieval of inputs from shared memory 108 and/or transmitting of outputs to shared memory 108 is minimized to improve performance of independent-cache collaboration system 102. In this way, independent-cache collaboration system 102 may permit specifying an input source and an output destination via instructions executed on processor 104 (e.g., instructions of the independent-cache collaboration algorithm) and may permit dynamically changing the input source and output destination. Independent-cache collaboration system 102 does not use a hardware cache coherence protocol to determine input sources and output destinations. Existing systems may determine the input sources and output destinations with the hardware cache coherence protocol and the input sources and output destinations may not be explicitly specified.

[0119] The number and arrangement of systems and devices shown in FIG. 1 are provided as an example. There may be additional systems and/or devices, fewer systems and/or devices, different systems and/or devices, and/or differently arranged systems and/or devices than those shown in FIG. 1. Furthermore, two or more 32

SUBSTITUTE SHEET ( RULE 26) systems or devices shown in FIG. 1 may be implemented within a single system or device, or a single system or device shown in FIG. 1 may be implemented as multiple, distributed systems or devices. Additionally or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of system 100 may perform one or more functions described as being performed by another set of systems or another set of devices of system 100.

[0120] Referring now to FIG. 2, shown is a process 200 for maintaining coherent and/or consistent cache memory with independent processor caches in a scalable multiprocessor system absent a hardware cache coherence protocol according to some non-limiting embodiments or aspects. The steps shown in FIG. 2 are for example purposes only. It will be appreciated that additional, fewer, different, and/or a different order of steps may be used in non-limiting embodiments or aspects.

[0121] As shown in FIG. 2, at step 202, process 200 may include executing an independent-cache collaboration algorithm. For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may execute an independentcache collaboration algorithm. In some non-limiting embodiments or aspects, processor 104 (e.g., each processor 104 of a plurality of processors 104) may execute the independent-cache collaboration algorithm (e.g., each processor 104 may execute a separate instance of the independent-cache collaboration algorithm). In some nonlimiting embodiments or aspects, the independent-cache collaboration algorithm may include software instructions (e.g., a software application, a software program, and/or the like) configured to cause processor 104 to perform operations on data resources in shared memory 108 (e.g., a shared memory location of shared memory 108, a shared memory block of shared memory 108, and/or the like). In some non-limiting embodiments or aspects, the independent-cache collaboration algorithm may include software instructions configured to cause processor 104 to perform operations on data resources in cache memory 106 (e.g., data resources stored in one or more cache memory blocks of cache memory) of processor 104. In some non-limiting embodiments or aspects, the shared memory (e.g., shared memory 108) may include a plurality of shared memory locations. Each shared memory location may include a shared memory address identifying the shared memory location (e.g., a shared memory block).

33

SUBSTITUTE SHEET ( RULE 26) [0122] In some non-limiting embodiments or aspects, process 200 may include monitoring a shared queue of tasks. For example, processor 104 may monitor a shared queue of tasks and processor 104 may determine whether to execute a next task in the shared queue of tasks based on processor 104 lacking an assigned task (e.g., processor 104 has completed a previously assigned task). Once processor 104 determines to execute the next task in the shared queue of tasks, the next task is assigned to processor 104 for execution. In some non-limiting embodiments or aspects, processor 104 may acquire system resources required to execute a task. In this way, each processor 104 may execute one task at a given time, and no processors 104 in system 100 would be required to multitask. Each processor 104 may share a queue of tasks, with each processor 104 obtaining the next task in the queue of tasks when processor 104 lacks an assigned task (e.g., when processor 104 completes a previous task).

[0123] In some non-limiting embodiments, processor 104 (e.g., processor 104-1 ) may monitor a queue of tasks without any other processors 104 monitoring the queue of tasks. For example, processor 104-1 (e.g., a single processor) may monitor the queue of tasks while processor 104-2 to processor 104-n do not monitor the queue of tasks and/or while processor 104-2 to processor 104-n monitor other queues in independent-cache collaboration system 102. Additionally or alternatively, processor 104 (e.g., processor 104-2) may monitor a queue of tasks while other processors 104 monitor the queue of tasks simultaneously. For example, processor 104-2 may monitor a queue of tasks simultaneously while processor 104-3 monitors the same queue of tasks. A queue of tasks is not limited to having any specific number of processors monitor the queue of tasks, and the queue of tasks may be monitored by any processors 104 simultaneously. Likewise, any processor 104 of independentcache collaboration system 102 may individually monitor a queue of tasks or any processors 104 may simultaneously monitor a queue of tasks. The capability to monitor queues is not limited to any one specific processor 104 of independent-cache collaboration system 102. In this way, a queue of tasks monitored by one processor 104 (e.g., single processor task queues) may be atomic because processor 104, while monitoring the queue of tasks without any other processors 104 monitoring the queue of tasks, may execute at least one task in a single thread of processor 104. A queue of tasks monitored by processor 104 along with other processors 104 monitoring the 34

SUBSTITUTE SHEET ( RULE 26) queue of tasks simultaneously (e.g., multiple processor task queues) may include tasks that are able to execute on separate processors 104 concurrently.

[0124] As shown in FIG. 2, at step 204, process 200 may include locking a memory resource. For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may lock a shared memory location (e.g., a shared memory block) of shared memory 108 based on a shared memory address. In some non-limiting embodiments or aspects, a memory resource may include a memory location (e.g., a shared memory location) and/or a plurality of memory locations (e.g., a plurality of shared memory locations) where one or more data resources (e.g., shared data resources) may be retrieved and/or stored. In some non-limiting embodiments or aspects, a memory location may include a location in memory (e.g., a shared memory block in shared memory 108) corresponding to a memory address (e.g., a virtual memory address and/or a physical memory address). The memory location may be associated with a limit of an amount of data that can be stored in the memory location (e.g., 1 byte).

[0125] In some non-limiting embodiments or aspects, processor 104 may lock the shared memory location of shared memory 108 based on a semaphore. For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may execute the independent-cache collaboration algorithm to cause processor 104 to lock the shared memory location of shared memory 108 using a semaphore (e.g., a binary semaphore, a counting semaphore, and/or the like). In some non-limiting embodiments or aspects, the semaphore may be associated with the shared memory location and/or the shared memory address. In some non-limiting embodiments or aspects, each shared memory location of shared memory 108 may be associated with a semaphore of a plurality of semaphores.

[0126] In some non-limiting embodiments or aspects, processor 104 may lock the shared memory location by decrementing a value of the semaphore, where the semaphore is associated with the shared resource. For example, first processor 104- 1 may generate the semaphore (e.g., an integer, an unsigned integer, and/or data variable representing an integer) to control a lock to the shared memory location of shared memory 108. In some non-limiting embodiments or aspects, the value of the semaphore equal to 1 may indicate that the shared memory location associated with the semaphore is unlocked. In some non-limiting embodiments or aspects, the value 35

SUBSTITUTE SHEET ( RULE 26) of the semaphore equal to 0 may indicate that the shared memory location associated with the semaphore is locked. For example, first processor 104-1 may decrement the value of the semaphore (e.g., from a first value of 1 to a second value of 0) to indicate that the shared memory location associated with the semaphore is locked. Second processor 104-2 may attempt to lock the shared memory location by decrementing (e.g., based on a request to decrement) the value of the semaphore (e.g., from the second value of 0).

[0127] In some non-limiting embodiments or aspects, the independent-cache collaboration algorithm may cause processor 104 (e.g., second processor 104-2) to wait (e.g., perform no action with respect to the shared memory location associated with the semaphore) when the value of the semaphore is equal to 0 at a time processor 104 (e.g., second processor 104-2) is attempting to decrement the value of the semaphore. For example, second processor 104-2 may not decrement the value of the semaphore unless the value of the semaphore is equal to 1. While the value of the semaphore is equal to 0, any processor 104 (e.g., second processor 104-2, third processor 104-3, etc.) that attempts to decrement the value of the semaphore may have a request to decrement the value of the semaphore added to a queue of requests (e.g., a queue of requests implementing a first-in-first-out method) where the request is associated with processor 104 that attempted to decrement the value of the semaphore. Once processor 104 (e.g., first processor 104-1 ) that had locked the semaphore by decrementing the semaphore (e.g., first processor 104-1 ) is finished performing operations on (e.g., updating, modifying, and/or the like) the shared data resource stored in the shared memory location associated with the semaphore, processor 104 (e.g., first processor 104-1 ) may increment the value of the semaphore that is equal to 0 to a value equal to 1 (e.g., first processor 104-1 may unlock the shared memory location associated with the semaphore).

[0128] In response to processor 104 (e.g., first processor 104-1 ) incrementing the value of the semaphore to a value equal to 1 , a first request to decrement the value of the semaphore that had been added to the queue of requests may be processed, and processor 104 (e.g., second processor 104-2) associated with the first request to decrement the value of the semaphore may decrement the value of the semaphore to a value equal to 0. In some non-limiting embodiments or aspects, processor 104 (e.g., processor 104-2) may lock the shared memory location in order to perform operations 36

SUBSTITUTE SHEET ( RULE 26) on the shared data resource stored in the shared memory location associated with the semaphore.

[0129] In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may lock one or more shared memory blocks of shared memory 108, the one or more shared memory blocks including one or more shared data resources. Locking the one or more shared memory blocks may provide that no other processor 104 and/or software application (e.g., another instance of the independent-cache collaboration algorithm) may change the contents (e.g., a shared data resource) of one or more shared memory blocks of shared memory 108. In some non-limiting embodiments or aspects, a shared data resource may refer to data and/or a value stored in a shared memory block of shared memory 108 that may be modified by processor 104 performing a task and/or process. In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may lock the one or more shared memory blocks using a semaphore.

[0130] In some non-limiting embodiments or aspects, when locking one or more shared memory blocks of shared memory 108, processor 104 may toggle (e.g., switch between two or more modes of operation) the instruction set that specifies an input source and an output destination to specify other input sources and other output destinations. For example, when locking one or more shared memory blocks of shared memory 108, processor 104 may toggle the instruction set that specifies the input source as cache memory 106 and the output destination as cache memory 106 while processor 104 is not locking one or more shared memory blocks of shared memory 108 (e.g., shared memory 108 is unlocked by processor 104) to the instruction set that specifies the input source as shared memory 108 and the output destination as cache memory 106 and shared memory 108. In this way, the instruction set may be synchronized with the particular operation that processor 104 is performing, such that when processor 104 locks a shared memory block of shared memory, processor 104 is going to perform operations on the shared memory block, so processor 104 should use shared memory 108 as an input source and cache memory 106 and shared memory 108 as an output source so that processor 104 may perform an operation on the data resource in the shared memory block and update and transmit the updated data resource back to shared memory 108.

37

SUBSTITUTE SHEET ( RULE 26) [0131] In order to maintain consistency within shared memory (e.g., shared memory 108), non-limiting embodiments or aspects may lock shared memory using a HSP or shared memory may be changed by using a HSP. The HSP may enable multiple tasks to run simultaneously on a single processor. When a HSP is implemented on multiprocessors and/or a multiprocessor system that uses shared memory, the HSP may be atomic. A HSP that is atomic may prevent other processors in a multiprocessor system from accessing shared data (e.g., a shared data resource) stored in shared memory after the start of an instruction by a first processor, but before the instruction has been completed by the first processor.

[0132] In some non-limiting embodiments or aspects, processor 104 may change (e.g., modify, update, write, and/or the like) shared memory 108 using a HSP. For example, processor 104 may change shared memory 108 by locking a shared data resource of shared memory 108 using a HSP. In some non-limiting embodiments or aspects, processor 104 may allocate one or more shared memory blocks (e.g., a portion of memory in shared memory 108, the portion of memory configured to store a data resource) of shared memory 108 before changing shared memory 108 (e.g., the one or more shared memory blocks of shared memory 108). Processor 104 may allocate the one or more shared memory blocks by executing a software instruction (e.g., an allocation instruction). For example, processor 104 may allocate (e.g., set aside and/or initialize) a shared memory block in shared memory 108 such that a variable, value, and/or data resource may be stored in the shared memory block. In some non-limiting embodiments or aspects, processor 104 may allocate one or more shared memory blocks using static memory allocation (e.g., allocating memory for variables declared at compile time of a software program). In some non-limiting embodiments or aspects, processor 104 may allocate one or more shared memory blocks using dynamic memory allocation (e.g., allocating memory for variables and/or data resources at runtime of a software program). In this way, independent-cache collaboration system 102 (e.g., processor 104 thereof) may prevent other shared data resources from being impacted by processor 104 changing shared memory 108 (e.g., changing a shared data resource residing in a shared memory block in shared memory 108). A HSP may allow processor 104 to change shared memory 108 (e.g., a shared data resource) without using a lock. For example, processor 104 may change shared memory 108 by using a HSP to perform a conditional swap operation (e.g., compare- 38

SUBSTITUTE SHEET ( RULE 26) and-swap instruction). Upon performing the conditional swap operation, processor 104 may receive a condition code (e.g., the conditional swap operation returns a condition code). In some non-limiting embodiments or aspects, processor 104 may change shared memory 108 by using a spin loop.

[0133] As shown in FIG. 2, at step 206, process 200 may include retrieving a data resource from the memory resource. For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may retrieve a shared data resource from the shared memory location of shared memory 108 based on a shared data load instruction. In some non-limiting embodiments or aspects, the shared data load instruction may include the shared memory address of the shared memory location of shared memory 108. In some non-limiting embodiments or aspects, the shared data load instruction may include software instructions configured to cause processor 104 to retrieve a value (e.g., a current value, a value at the time of retrieval, and/or the like) of a data resource (e.g., a shared data resource) from shared memory 108 (e.g., by specifying the shared memory location in the shared data load instruction via a shared memory address).

[0134] In some non-limiting embodiments or aspects, the shared data load instruction may include a modified load instruction. For example, the shared data load instruction (e.g., a current value load instruction) may include a load instruction that is modified to access shared memory 108 to get a current (e.g., most up-to-date) value of a shared data resource in shared memory 108. In some non-limiting embodiments or aspects, a load instruction may include an instruction of an instruction set based on hardware of a processor that causes the processor to move data from memory and/or a memory location having a memory address to a register (e.g., a data storage area of the processor where data values are stored such that the processor may operate on the data values) of the processor. In some non-limiting embodiments or aspects, a modified load instruction may be implemented in software instructions such that it causes processor 104 to perform a modified load instruction when processor 104 performs the shared data load instruction. In some non-limiting embodiments or aspects, the shared data load instruction may require shared memory 108 (e.g., shared data resources stored in shared memory locations thereof) to be updated based on a HSP. In some non-limiting embodiments or aspects, processor 104 may perform a shared data load instruction when processor 104 is required to retrieve an 39

SUBSTITUTE SHEET ( RULE 26) updated shared data resource (e.g., a current value, an up-to-date value, and/or the like). In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may retrieve the shared data resource from shared memory 108 based on a load instruction. In some non-limiting embodiments or aspects, the shared data load instruction may include a same format as a load instruction (e.g., an unmodified load instruction, a standard load instruction, and/or the like). In some non-limiting embodiments or aspects, processor 104 may prevent failure of retrieving the shared data resource from shared memory 108 based on (e.g., by using) the shared data load instruction.

[0135] In some non-limiting embodiments or aspects, the independent-cache collaboration algorithm may be configured to cause processor 104 to retrieve a data resource (e.g., a shared data resource) from shared memory 108. In some nonlimiting embodiments or aspects, processor 104 may retrieve a data resource for a first time after locking the shared memory location of shared memory 108 based on a shared data load instruction. In some non-limiting embodiments or aspects, processor 104 may retrieve the shared data resource from the shared memory location of shared memory 108 by generating a copy of (e.g., copying) the shared data resource such that the shared data resource still resides in the shared memory location of shared memory 108.

[0136] In some non-limiting embodiments or aspects, the independent-cache collaboration algorithm may be configured to cause processor 104 to allocate memory. For example, the independent-cache collaboration algorithm may be configured to cause processor 104 to execute a memory allocation instruction. Processor 104 may execute a shared memory allocation instruction to allocate memory in shared memory 108. Processor 104 may execute a cache memory allocation instruction to allocate memory in cache memory 106. In some non-limiting embodiments or aspects, processor 104 may execute a memory allocation instruction that may be classified as a shared memory allocation instruction or a cache memory allocation instruction based on an indicator (e.g., an indicator of an address of memory) and/or an address of memory that is to be allocated included in the memory allocation instruction (e.g., whether the address is an address to cache memory or an address to shared memory). [0137] In this way, a local version of a shared data resource (e.g., a shared data resource loaded into cache memory 106, a cache data resource, etc.) should be 40

SUBSTITUTE SHEET ( RULE 26) retrieved from shared memory 108 after the shared memory location has been locked by the lock (e.g., a semaphore). To ensure data resources in the system remain current and up-to-date, a shared data resource and a corresponding cache data resource residing in cache memory 106 should be current and/or consistent at a time the lock is released. Thus, any local data resource (e.g., a data resource local to processor 104) that is being operated on and/or updated by processor 104 should be stored in shared memory 108 before the lock is released.

[0138] As shown in FIG. 2, at step 208, process 200 may include storing the data resource in cache memory. For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may store the shared data resource (e.g., a copy of the shared data resource) in cache memory 106 of processor 104 to generate a cache data resource. In some non-limiting embodiments or aspects, processor 104 may generate the cache data resource by storing the shared data resource (e.g., a copy of the shared data resource) in cache memory 106 of processor 104. A shared data resource that is retrieved and later stored in cache memory 106 will eventually be updated in shared memory 108 if the shared data resource is modified. Expressly, any changes to a shared data resource will not occur solely in cache memory 106. That is, changes to a shared data resource (e.g., a cache data resource residing in cache memory 106 that was previously retrieved from shared memory 108 as a shared data resource) in cache memory 106 will be propagated to shared memory 108. In this way, shared data resources may become cache data resources when transmitted to cache memory 106 for operation, and an updated cache data resource may be stored in shared memory 108 to become an updated shared data resource.

[0139] In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may allocate memory in shared memory 108 (e.g., cacheless shared memory) and/or in cache memory 106 (e.g., independent cache memory) by performing a function call (e.g., malloc() in the C programming language). Processor 104 may allocate a shared memory block of shared memory 108 by executing a software instruction to allocate cacheless shared memory. Processor 104 may allocate memory in a cache memory block of cache memory 106 by executing a software instruction to allocate independent cache memory. A software instruction to allocate memory (e.g., cacheless shared memory and/or independent cache memory) may include a software instruction similar to 41

SUBSTITUTE SHEET ( RULE 26) malloc() and/or the like. In some non-limiting embodiments or aspects, processor 104 may generate a first memory address (e.g., a memory address associated with a shared memory block) based on allocating memory in shared memory 108, and processor 104 may generate a second memory address (e.g., a memory address associated with a cache memory block) based on allocating memory in cache memory 106. In some non-limiting embodiments, the first memory address and the second memory address are different, such that the first memory address may be identified as a memory address associated with a shared memory block and the second memory address may be identified as a memory address associated with a cache memory block.

[0140] In order to facilitate allocating of shared memory, the independent-cache collaboration algorithm may be configured to execute a cache memory allocation instruction that allocates memory from a cache memory pool (e.g., a pool of cache memory resources, cache memory resource pool 416) that allocates cache memory addresses to some or all other processors 104 of the plurality of processors 104. The cache memory pool may supply an address associated with a cache memory (e.g., a cache memory block, cache memory 106, and/or the like) to one or more processors 104. In some non-limiting embodiments or aspects, the cache memory allocation instruction may perform an atomic HSP on a shared memory location that adds a request to a queue of cache memory allocation requests. The queue of cache memory allocation requests may be continually monitored by a task that may execute as a stand-alone task on processor 104 of the plurality of processors 104 (e.g., processor 104-1 ).

[0141] In some non-limiting embodiments or aspects, the shared memory allocation instruction may perform an atomic HSP on a shared memory location that adds a request to a queue of shared memory allocation requests. The queue of shared memory allocation requests is continually monitored by a task that executes as a stand-alone task on processor 104 of the plurality of processors 104 (e.g., processor 104-2). In this way, at least one first processor 104 (e.g., processor 104-1 ) of the plurality of processors 104 may execute a task that continually monitors the queue of cache memory allocation requests and at least one second processor 104 (e.g., processor 104-2) of the plurality of processors 104 may execute a task that continually monitors the queue of shared memory allocation requests. Thus, independent-cache 42

SUBSTITUTE SHEET ( RULE 26) collaboration system 102 may update shared memory queues instead of initiating tasks. Although each processor is described as executing a task to continually monitor a single queue, in some non-limiting embodiments or aspects, a single processor may continually monitor (e.g., via a task) more than one queue in a prioritized sequence.

[0142] In order to facilitate allocating of shared memory, the independent-cache collaboration algorithm may be configured to differentiate between shared memory and cache memory because shared memory and cache memory reside in different locations. Therefore, in some non-limiting embodiments or aspects, independentcache collaboration system 102 (e.g., processor 104 thereof executing an instance of the independent-cache collaboration algorithm) may determine whether to allocate shared memory or cache memory based on a shared memory allocation instruction. In this way, independent-cache collaboration system 102 may ensure that all memory allocation defaults to shared memory for migration of data resources, however independent-cache collaboration system 102 may still allocate cache memory for performance purposes.

[0143] Additionally, independent-cache collaboration system 102 (e.g., processor 104 thereof executing the independent-cache collaboration algorithm) may be configured to differentiate between a location of shared memory and a location of cache memory by determining whether the memory is shared between processors (e.g., shared memory) or whether the memory is not shared between processors (e.g., cache memory). Processor 104 may determine that memory is shared by detecting a memory location (e.g., a shared memory block) in shared memory, and processor 104 may determine that memory is not shared by detecting a memory location (e.g., a cache memory block) in cache memory 106 of processor 104. For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may designate an address of a memory location as a shared address. In some non-limiting embodiments or aspects, processor 104 may designate an address of a memory location as a shared address such that when processor 104 executes an instruction that updates the shared address, processor 104 dynamically updates both the shared address (e.g., an address to a shared memory block of shared memory 108) and an address in cache memory 106 of processor 104 that contains the same data resource as the shared address.

43

SUBSTITUTE SHEET ( RULE 26) [0144] As shown in FIG. 2, at step 210, process 200 may include performing an operation on the data resource in cache memory. For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may perform an operation on the cache data resource in cache memory 106 of processor 104 to generate an updated cache data resource in cache memory 106 of processor 104.

[0145] As shown in FIG. 2, at step 212, process 200 may include updating the data resource in the memory resource. For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may update the shared data resource at (e.g., stored in) the shared memory location of shared memory 108 based on the updated cache data resource in cache memory 106 of processor 104. In some nonlimiting embodiments or aspects, processor 104 may update the shared data resource based on the updated cache data resource by generating a copy of (e.g., copying) the updated cache data resource from cache memory 106 of processor 104 and overwriting the shared data resource at the shared memory location of shared memory 108 with the copy of the updated cache data resource (e.g., by deleting the shared data resource in the shared memory location and storing the copy of the updated cache data resource in the shared memory location of shared memory 108). In this way, processor 104 may update and/or overwrite shared data resources only at the shared memory location of shared memory 108 for which processor 104 had previously locked based on the shared memory address of the shared memory location. While a first processor 104-1 has locked the shared memory location, other processors 104 of the plurality of processors 104 may not update and/or overwrite shared data resources at the shared memory location.

[0146] In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may update the shared data resource at the shared memory address of shared memory 108 based on the updated cache data resource in cache memory 106 of processor 104 based on a HSP. For example, processor 104 may update the shared data resource at the shared memory address of shared memory 108 based on the updated cache data resource in cache memory 106 of processor 104 by using a HSP to perform the update of the shared data resource (e.g., an atomic operation, atomic exchange, test-and-set, compare and exchange bytes (CMPXCHG8), and/or the like). In some non-limiting embodiments or aspects, a HSP may function in adherence to the independent-cache collaboration 44

SUBSTITUTE SHEET ( RULE 26) algorithm. The HSP may provide updates based on one or more conditions (e.g., rules). In this way, the HSP may provide an atomic conditional update.

[0147] In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may update the shared data resource at the shared memory address of shared memory 108 by storing the updated cache data resource at the shared memory address of shared memory 108 based on a modified HSP. For example, processor 104 may update the shared data resource at the shared memory address of shared memory 108 by storing the updated cache data resource (e.g., overwriting the shared data resource, storing the updated cache data resource as an updated shared data resource) using a modified HSP to generate an updated shared data resource. In some non-limiting embodiments or aspects, the modified HSP may require the updated shared data resource to be retrieved by other processors 104 based on a shared data load instruction. In this way, independentcache collaboration system 102 may ensure that each processor 104 is retrieving a current (e.g., most up-to-date) value of the shared data resource (e.g., and/or the updated shared data resource). In some non-limiting embodiments or aspects, the modified HSP may include a same format as a HSP (e.g., unmodified HSP, standard HSP, and/or the like).

[0148] As shown in FIG. 2, at step 214, process 200 may include unlocking the memory resource. For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may unlock the shared memory location of shared memory 108 based on the shared memory address. In some non-limiting embodiments or aspects, processor 104 may unlock the shared memory location after processor 104 has updated the shared data resource at the shared memory location (e.g., once processor 104 is finished updating and/or operating on the shared data resource at the shared memory location).

[0149] Referring now to FIG. 3, shown is a process 300 for maintaining coherent and/or consistent cache memory with independent processor caches in a scalable multiprocessor system absent a hardware cache coherence protocol according to some non-limiting embodiments or aspects. The steps shown in FIG. 3 are for example purposes only. It will be appreciated that additional, fewer, different, and/or a different order of steps may be used in non-limiting embodiments or aspects.

45

SUBSTITUTE SHEET ( RULE 26) [0150] As shown in FIG. 3, at step 302 (e.g., step (a)), process 300 may include executing an independent-cache collaboration algorithm. For example, independentcache collaboration system 102 (e.g., processor 104 thereof) may execute the independent-cache collaboration algorithm. In some non-limiting embodiments or aspects, processor 104 (e.g., each processor 104 of a plurality of processors 104) may execute the independent-cache collaboration algorithm. In some non-limiting embodiments or aspects, the independent-cache collaboration algorithm may include software instructions (e.g., a software application) configured to cause processor 104 to perform operations on data resources in a shared memory location (e.g., shared memory 108). In some non-limiting embodiments or aspects, the independent-cache collaboration algorithm may include software instructions configured to cause processor 104 to perform operations on data resources in cache memory 106 of processor 104. In some non-limiting embodiments or aspects, the shared memory (e.g., shared memory 108) may include a plurality of shared memory locations. Each shared memory location may include a shared memory address identifying the shared memory location.

[0151] As shown in FIG. 3, at step 304 (e.g., step (b)), process 300 may include retrieving a data resource from a memory resource. For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may retrieve a shared data resource from a shared memory address of shared memory 108 based on a shared data load instruction. In some non-limiting embodiments or aspects, the shared data load instruction may include the shared memory address of the shared memory location of shared memory 108. In some non-limiting embodiments or aspects, the shared data load instruction may include software instructions configured to cause processor 104 to retrieve a value (e.g., a current value, a value at the time of retrieval, and/or the like) of a data resource (e.g., a shared data resource) from shared memory 108 (e.g., by specifying the shared memory location in the shared data load instruction via a shared memory address). For example, a shared data load instruction may include a memory address that includes an indicator bit, the indicator bit identifying the memory address as a shared memory address or a cache memory address. In some non-limiting embodiments, independent-cache collaboration system 102 may translate a virtual memory address to a physical memory address. The physical memory

46

SUBSTITUTE SHEET ( RULE 26) address may include an indicator bit that identifies the physical memory address as a shared memory address or a cache memory address.

[0152] In some non-limiting embodiments or aspects, processor 104 may retrieve the shared data resource from the shared memory location of shared memory 108 by generating a copy of (e.g., copying) the shared data resource such that the shared data resource still resides in the shared memory location of shared memory 108. In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may retrieve the shared data resource based on an atomic operation (e.g., retrieving the shared data resource is an atomic operation).

[0153] As used herein, an atomic operation may refer to an operation (e.g., an operation executed by a processor) that may execute completely independent of any other operation and/or process. For example, an atomic operation will be completely executed until the operation is complete and other operations that may need to be executed cannot interrupt the atomic operation. In a multiprocessor system, (e.g., independent-cache collaboration system 102) an atomic operation executed by a first processor 104-1 cannot be interrupted by another operation executed by a second processor 104-2. For example, the second processor 104-2 may be prohibited from accessing data (e.g., reading data, copying data, overwriting data, and/or the like) that is being processed by the atomic operation until the atomic operation is complete. The first processor 104-1 may process the data using the atomic operation to prohibit other operations and/or processors 104 from updating the data during the atomic operation. The independent execution of the operation may allow for the processing of data to be carried out in full (e.g., a complete operation) without interruption. In this way, independent-cache collaboration system 102 may guarantee that operations are executed completely without interruption by another operation and/or process and may ensure that the values of shared data resources of shared memory 108 remain updated (e.g., current, up-to-date) for each processor 104 to retrieve and use for operations and/or calculations. In non-limiting embodiments disclosed herein, a shared data resource residing in shared memory 108 may be updated with an atomic HSP, using either a semaphore or a spin loop.

[0154] In some non-limiting embodiments or aspects, the shared data load instruction may include a modified load instruction. For example, the shared data load instruction (e.g., a current value load instruction) may include a load instruction that is 47

SUBSTITUTE SHEET ( RULE 26) modified to access shared memory 108 to get a current (e.g., most up-to-date) value of a shared data resource in shared memory 108. In some non-limiting embodiments or aspects, a load instruction may include an instruction of an instruction set based on hardware of a processor that causes the processor to move data from memory and/or a memory location having a memory address to a register (e.g., a data storage area of the processor where data values are stored such that the processor may operate on the data values) of the processor. In some non-limiting embodiments or aspects, a modified load instruction may be implemented in software instructions such that it causes processor 104 to perform a modified load instruction when processor 104 performs the shared data load instruction. In some non-limiting embodiments or aspects, the shared data load instruction may require shared memory 108 (e.g., shared data resources stored in shared memory locations thereof) to be updated based on a HSP. In some non-limiting embodiments or aspects, processor 104 may perform a shared data load instruction when processor 104 is required to retrieve an updated shared data resource (e.g., a current value, an up-to-date value, and/or the like). In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may retrieve the shared data resource from shared memory 108 based on a load instruction. In some non-limiting embodiments or aspects, the shared data load instruction may include a same format as a load instruction (e.g., an unmodified load instruction, a standard load instruction, and/or the like).

[0155] As shown in FIG. 3, at step 306 (e.g., step (c)), process 300 may include storing the data resource in cache memory. For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may store the shared data resource (e.g., a copy of the shared data resource) in cache memory 106 of processor 104 to generate a cache data resource. In some non-limiting embodiments or aspects, processor 104 may generate the cache data resource by storing the shared data resource (e.g., a copy of the shared data resource) in cache memory 106 of processor 104. In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may store the shared data resource in cache memory 106 based on an atomic operation (e.g., storing the shared data resource in cache memory 106 is an atomic operation).

48

SUBSTITUTE SHEET ( RULE 26) [0156] As shown in FIG. 3, at step 308 (e.g., step (d)), process 300 may include performing an operation on the data resource in cache memory. For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may perform an operation (e.g., add, subtract, and/or the like) on the cache data resource in cache memory 106 of processor 104 to generate an updated cache data resource in cache memory 106 of processor 104. In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may perform an operation on the cache data resource based on an atomic operation (e.g., performing an operation on the cache data resource is an atomic operation).

[0157] As shown in FIG. 3, at step 310 (e.g., step (e)), process 300 may include detecting that the data resource is modified in the memory resource. For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may detect that the shared data resource at the shared memory address of shared memory 108 is modified (e.g., updated, modified by another processor, and/or the like). In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may detect that the shared data resource at the shared memory address of shared memory 108 is modified based on an atomic operation (e.g., the detection that the shared data resource is modified is an atomic operation). In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may detect that the shared data resource at the shared memory address of shared memory 108 is updated using a spin loop.

[0158] As used herein, a spin loop may refer to software instructions that may cause a processor to monitor a data resource and perform no action on the data resource (e.g., wait) until the data resource has been modified. In a case of independent-cache collaboration system 102, processor 104 may monitor the shared data resource at the shared memory address of shared memory 108 and may detect when the shared data resource has been modified by another processor 104 until processor 104 takes the action of updating the shared data resource with an updated cache data resource from cache memory 106. In this way, independent-cache collaboration system 102 may keep the value of the shared data resource at the shared memory address of shared memory 108 updated (e.g., current) without allowing interfering updates between different processors 104.

49

SUBSTITUTE SHEET ( RULE 26) [0159] As shown in FIG. 3, at step 312 (e.g., step (f)), process 300 may include repeating steps 304-308. For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may repeat the steps of process 300 including the step of retrieving the shared data resource, the step of storing the shared data resource in cache memory to generate a cache data resource, and the step of performing an operation on the cache data resource (e.g., steps 304-308 of process 300) in response to independent-cache collaboration system 102 detecting that the shared data resource at the shared memory address of shared memory 108 is modified (e.g., and/or has been previously modified by another process, processor, independentcache collaboration algorithm, and/or the like). In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may repeat the steps of process 300 (e.g., steps 304-308, steps (b)-(d)) based on an atomic operation (e.g., steps 304-308 and/or steps (b)-(d) are completed as an atomic operation, for example after processor 104 detects that the shared data resource is modified in shared memory 108).

[0160] As shown in FIG. 3, at step 314 (e.g., step (g)), process 300 may include updating the data resource in the memory resource (e.g., the data resource stored in the memory resource). For example, independent-cache collaboration system 102 (e.g., processor 104 thereof) may update the shared data resource at the shared memory address of shared memory 108 based on the updated cache data resource in the cache memory of the at least one processor. In some non-limiting embodiments or aspects, processor 104 may retrieve the updated cache data resource from cache memory 106 by generating a copy of (e.g., copying) the updated cache data resource such that the updated cache data resource still resides in cache memory 106. Processor 104 may transmit the updated cache data resource to shared memory 108 and processor 104 may overwrite the shared data resource at the shared memory address of shared memory 108 with the copy of the updated cache data resource such that the shared data resource includes a data value of the updated cache data resource after the shared data resource has been updated based on the updated cache data resource (e.g., after the shared data resource has been overwritten at the shared memory address of shared memory 108). In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may update the shared data resource at the shared memory address of shared 50

SUBSTITUTE SHEET ( RULE 26) memory 108 based on an atomic operation (e.g., processor 104 updating the shared data resource at the shared memory address of shared memory 108 is an atomic operation).

[0161] In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may update the shared data resource at the shared memory address of shared memory 108 based on the updated cache data resource in cache memory 106 of processor 104 based on a HSP. For example, processor 104 may update the shared data resource at the shared memory address of shared memory 108 based on the updated cache data resource in cache memory 106 of processor 104 by using a HSP to perform the update of the shared data resource (e.g., an atomic operation, atomic exchange, test-and-set, and/or the like).

[0162] In some non-limiting embodiments or aspects, independent-cache collaboration system 102 (e.g., processor 104 thereof) may update the shared data resource at the shared memory address of shared memory 108 by storing the updated cache data resource at the shared memory address of shared memory 108 based on a modified HSP. For example, processor 104 may update the shared data resource at the shared memory address of shared memory 108 by storing the updated cache data resource (e.g., overwriting the shared data resource, storing the updated cache data resource as an updated shared data resource) using a modified HSP to generate an updated shared data resource. In some non-limiting embodiments or aspects, the modified HSP may require the updated shared data resource to be retrieved by other processors 104 based on a shared data load instruction. In this way, independentcache collaboration system 102 may ensure that each processor 104 is retrieving a current (e.g., most up-to-date) value of the shared data resource (e.g., and/or the updated shared data resource). In some non-limiting embodiments or aspects, the modified HSP may include a same format as a HSP (e.g., unmodified HSP, standard HSP, and/or the like).

[0163] Referring now to FIG. 4, shown is a system 400 for maintaining coherent and/or consistent cache memory with independent processor caches in a scalable multiprocessor system absent a hardware cache coherence protocol according to some non-limiting embodiments or aspects. System 400 may include shared memory 408, processor pool 414, (e.g., a group of processors in a multiprocessor system), and 51

SUBSTITUTE SHEET ( RULE 26) cache memory resource pool 416. Cache memory resource pool 416 may include a group of cache memory resources 406 (e.g., in a multiprocessor system), each cache memory resource 406 being part of the group and each cache memory resource 406 functioning as at least one independent cache memory of at least one processor 404 of processor pool 414. That is, each cache memory resource 406 of cache memory resource pool 416 may be logically independent may be associated with at least one processor 404 of processor pool 414.

[0164] Shared memory 408 may include a queue. The queue may include one or more requests (e.g., one or more stored requests received from one or more processors in processor pool 414). For example, shared memory 408 may include queue of shared memory allocation requests 410 and/or queue of cache memory allocation requests 412. Processor pool 414 may include processor 404-1 to processor 404-n (e.g., a plurality of processors, referred to individually as processor 404 and collectively as processors 404 where appropriate). Cache memory resource pool 416 may include one or more cache memory resources (e.g., a plurality of cache memory resources), where each cache memory resource may include a cache memory (e.g., cache memory 406) of at least one processor of processor pool 414.

[0165] In some non-limiting embodiments or aspects, cache memory 406 may include (e.g., cache memory 406 may store, contain, and/or the like) one or more cache data resources stored in cache memory resource 406 by processor 404. In some non-limiting embodiments or aspects, each cache memory resource 406 of cache memory resource pool 416 may be associated with an address (e.g., an address of cache memory). The address may be associated with at least one processor 404 of the pool of processor 414 such that only the at least one processor 404 may transmit data to and/or access data from cache memory resource 406 of cache memory resource pool 416 associated with the address, and other processors 404 of processor pool 414 may not communicate with cache memory resource 406. In this way, each processor 404 may be associated with an independent cache memory resource of which other processors do not have access to (e.g., other processors 404 cannot communicate with cache memory resource 406, transmit data, retrieve data, and/or the like). In some non-limiting embodiments or aspects, processors 404 may be the same as or similar to processors 104, and cache memory resources 406 may be the same as or similar to cache memories 106.

52

SUBSTITUTE SHEET ( RULE 26) [0166] In some non-limiting embodiments or aspects, shared memory 408 may include (e.g., shared memory 408 and/or a shared memory block of shared memory 408 may store, contain, and/or the like) one or more shared data resources stored in shared memory 408 by processor 404. In some non-limiting embodiments or aspects, queue of shared memory allocation requests 410 may include one or more shared memory allocation requests, the one or more shared memory allocation requests representing a request by processor 404 to allocate shared memory 408 (e.g., a block of shared memory 408) based on an address of shared memory 408. For example, processor 404-1 may initiate a request to allocate shared memory in shared memory 408. Processor 404-1 may generate a request (e.g., a shared memory allocation request) containing an indicator and/or an address of shared memory 408 (e.g., an address contained in shared memory 408). Processor 404-1 may transmit the request to queue of shared memory allocation requests 410 to be processed. In the described example, processor 404-2 may execute a task to continually monitor queue of shared memory allocation requests 410 such that processor 404-2 may process the one or more shared memory allocation requests (e.g., executing a malloc() function call included in the request) to allocate shared memory (e.g., a shared memory block) in shared memory 408 for use by one or more processors 404.

[0167] In some non-limiting embodiments, each shared memory block that is allocated may be a different size or a same size of shared memory compared to other shared memory blocks of shared memory 408. The size of the shared memory block (e.g., an amount of shared memory allocated in shared memory 408) may be determined based on a size value included in a shared memory allocation request. In some non-limiting embodiments or aspects, the size value included in a shared memory allocation request may be based on the capacity and/or processing requirements of processor 404 (e.g., the processor generating the shared memory allocation request). In some non-limiting embodiments or aspects, once the shared memory allocation request generated by processor 404 has been processed (e.g., by another processor), processor 404 may receive a shared memory allocation response. The shared memory allocation response may include a pointer to a shared memory block. For example, processor 404 may receive a shared memory allocation response including a pointer to a shared memory block of shared memory 408 (e.g., by retrieving a response generated by another processor and stored in shared memory 408). In 53

SUBSTITUTE SHEET ( RULE 26) some non-limiting embodiments or aspects, the shared memory allocation request may include a malloc() function call.

[0168] In some non-limiting embodiments or aspects, queue of cache memory allocation requests 412 may include one or more cache memory allocation requests, the one or more cache memory allocation requests representing a request by processor 404 to allocate cache memory resource 406 (e.g., a block of cache memory) based on an address of cache memory resource 406 in cache memory resource pool 416. For example, processor 404-1 may initiate a request to allocate cache memory resource 406-1 , such that cache memory resource 406-1 is associated with processor 404-1. Processor 404-1 may generate a request (e.g., a cache memory allocation request) containing an indicator and/or an address of cache memory (e.g., an address contained in cache memory address pool 416) corresponding to cache memory resource 406-1 . Processor 404-1 may transmit the request to queue of cache memory allocation requests 412 to be processed. In the described example, processor 404-3 may execute a task to continually monitor queue of cache memory allocation requests 412 such that processor 404-3 may process the one or more cache memory allocation requests to allocate cache memory resource 406-1 in cache memory resource pool 416 to allocate cache memory (e.g., a cache memory block) for some or all other processors 404, such that each cache memory resource 416 is associated with at least one (e.g., exactly one) processor 404. Exactly one processor 404 may be associated with exactly one cache memory resource 416 based on the indicator and/or address associated with cache memory 416 included in the request generated by processor 404.

[0169] In some non-limiting embodiments, each cache memory resource 416 may be a different size or a same size of cache memory compared to other cache memory resource 416 of cache memory resource pool 416. The size of the cache memory resource (e.g., an amount of cache memory allocated in cache memory resource pool 416) may be determined based on a size value included in a cache memory allocation request. In some non-limiting embodiments or aspects, the size value included in a cache memory allocation request may be based on the capacity and/or processing requirements of processor 404 (e.g., the processor generating the cache memory allocation request). In some non-limiting embodiments or aspects, once the cache memory allocation request generated by processor 404 has been processed (e.g., by 54

SUBSTITUTE SHEET ( RULE 26) another processor), processor 404 may receive a cache memory allocation response. The cache memory allocation response may include a pointer to a cache memory resource (e.g., cache memory resource 406, a block of cache memory, and/or the like). For example, processor 404 may receive a cache memory allocation response including a pointer to cache memory resource 406 from shared memory 408 (e.g., by retrieving a response generated by another processor and stored in shared memory 408). In some non-limiting embodiments or aspects, the cache memory allocation request may include a malloc() function call.

[0170] In some non-limiting embodiments or aspects, at least one processor 404 may execute a task to continually monitor queue of shared memory allocation requests 410 and/or queue of cache memory allocation requests 412. Additionally or alternatively, more than one processor 404 may each execute a task to continually monitor queue of shared memory allocation requests 410 and/or queue of cache memory allocation requests 412. For example, exactly one processor 404 or more than one processor 404 may each execute a task to monitor multiple queues in a prioritized sequence.

[0171] Processor 404 (e.g., processor 404-1 to processor 404-n) may include at least one processor (e.g., a multi-core processor), such as a central processing unit (CPU), an accelerated processing unit (APU), a graphics processing unit (GPU), a microprocessor, and/or the like. In some non-limiting embodiments or aspects, processor 404 may include at least one processor having a single core (e.g., a singlecore processor) or at least one processor having multiple cores (e.g., a multiprocessor, a multi-core processor, a processor including more than one core, and/or the like). In some non-limiting embodiments or aspects, processor 404 may include at least one core (e.g., a core of a processor associated with a dedicated L1 cache) that is a component of (e.g., part of) a single-core processor or a multiprocessor.

[0172] In some non-limiting embodiments or aspects, processor 404 (e.g., processors 404-1 to 404-n) may be programmed to perform one or more steps of methods described herein. In some non-limiting embodiments or aspects, processors 404 may include one or more processors executing instructions (e.g., software instructions) that cause processors 404 to perform one or more steps of methods as described herein. In some non-limiting embodiments or aspects, processor 404 may be in communication with cache memory resource 406 and/or cache memory resource 55

SUBSTITUTE SHEET ( RULE 26) pool 416. In some non-limiting embodiments or aspects, processor 404 may be capable of receiving information (e.g., data, data resources, and/or the like) from and/or communicating (e.g., transmitting) information to cache memory resource 406 and/or cache memory resource pool 416. In some non-limiting embodiments or aspects, processor 404 may execute an instance of an independent-cache collaboration algorithm (e.g., an instance of an algorithm configured to keep information and/or data resources stored in cache memory coherent and/or consistent with information and/or data resources stored in shared memory 408 for processing by processor 404 and/or the like).

[0173] In some non-limiting embodiments or aspects, at least one processor 404 may execute an operating system where the operating system is a multitasking operating system which may require a hardware cache coherence protocol, but the operating system may be modified to execute the independent-cache collaboration algorithm. For example, at least one processor 404 may execute the operating system where the operating system is modified such that the operating system does not require a hardware cache coherence protocol and the operating system is modified to execute the independent-cache collaboration algorithm (e.g., a modified multitasking operating system). In some non-limiting embodiments or aspects, at least one processor 404 may execute an operating system where the operating system is a nonmultitasking atomic operating system which executes tasks from a shared memory queue (e.g., a queue of one or more tasks stored in shared memory 408). In some non-limiting embodiments or aspects, at least one processor 404 may execute either a modified multitasking operating system or a non-multitasking atomic operating system that may execute tasks from a shared memory queue. In some non-limiting embodiments or aspects, at least one processor 404 may add tasks to a shared memory queue, independent of whether processor 404 is executing a modified multitasking operating system or a non-multitasking atomic operating system. In some non-limiting embodiments or aspects, at least one processor 404 may continue execution while the at least one processor 404 requires a result (e.g., waits for a result that may be required for further execution) from a task that the at least one processor 404 added to a shared memory queue. To continue execution while the at least one processor 404 requires a result from the task, the at least one processor 404 may wait until the task added to the shared memory queue has a result (e.g., a result associated 56

SUBSTITUTE SHEET ( RULE 26) with the task added to the shared memory queue) stored in a shared memory location of shared memory 408. In this manner, the at least one processor 404 and tasks the at least one processor 404 has initiated may run and/or be executed concurrently. In some non-limiting embodiments or aspects, processor pool 414 may not contain any processors 404 that execute any operating systems that require multitasking. In this way, multitasking of conventional computing systems may be replaced by concurrent processing.

[0174] In some non-limiting embodiments or aspects, each processor 404 of a plurality of processors 404 may not be configured to interact with other processors 404 of the plurality of processors 404 to perform a hardware cache coherence protocol. For example, each processor 404 of the plurality of processors 404 may execute an independent-cache collaboration algorithm where the independent-cache collaboration algorithm may be configured to cause each processor 404 to interact with a cache memory of each processor respectively (e.g., processor 404-1 may interact with a cache memory associated with processor 404-1 , processor 404-2 may interact with a cache memory associated with processor 404-2, etc.). Each processor 404 of the plurality of processors 404 may execute the independent-cache collaboration algorithm where the independent-cache collaboration algorithm may be configured to cause each processor 404 to interact with shared memory 408. Each processor 404 of the plurality of processors 404 may execute the independent-cache collaboration algorithm where the independent-cache collaboration algorithm may be configured to cause each processor 404 to not interact (e.g., be unable to interact, be unable to access, and/or the like) with a cache memory resource (e.g., cache memory resource 406) of each other processor (e.g., processor 404-1 is not capable of interacting with cache memory resource 406-2 associated with processor 404-2, cache memory resource 406-3 associated with processor 404-3, etc., and processor 404-2 is not capable of interacting with cache memory resource 406-1 associated with processor 404-1 , cache memory resource 406-3 associated with processor 404-3, etc.).

[0175] In some non-limiting embodiments or aspects, the independent-cache collaboration algorithm executing on each processor 404 may include a software application and/or software program that may be reentrant. For example, a first instance of the independent-cache collaboration algorithm may be interrupted while 57

SUBSTITUTE SHEET ( RULE 26) executing on processor 404-1 and the independent-cache collaboration algorithm (e.g., the software program of the independent-cache collaboration algorithm) may be subsequently called (e.g., commanded to execute by a processor) to initiate a second instance of the independent-cache collaboration algorithm executing on processor 404-1 before the first instance of the independent-cache collaboration algorithm completes execution.

[0176] Cache memory resource 406 (e.g., a plurality of cache memory resources, referred to individually as cache memory resource 406 and collectively as cache memory resources 406 where appropriate) may include a cache memory resource (e.g., cache memory) internal to and/or associated with a processor (e.g., processors 404-1 to 404-n). In some non-limiting embodiments or aspects, cache memory resource 406 may include a level 1 and/or primary cache, a level 2 and/or secondary cache, and/or a level 3 and/or tertiary cache. In some non-limiting embodiments or aspects, cache memory resource 406 may include a storage component (e.g., a volatile storage component) that stores information and/or instructions for use by processor 404. In some non-limiting embodiments or aspects, cache memory resource 406 may include CPU memory of processor 404. In some non-limiting embodiments or aspects, cache memory resource 406 may include random access memory (RAM) and/or another type of static storage device.

[0177] In some non-limiting embodiments or aspects, cache memory resource 406 may store information and/or software related to the operation and use of processor 404. For example, cache memory resource 406 may include a type of computer- readable medium. In some non-limiting embodiments or aspects, cache memory resource pool 416 may include a single hardware resource in which cache memory resources 406 are allocated and/or partitioned based on cache memory allocation requests generated by processors 404. In some non-limiting embodiments or aspects, each cache memory resource 406 may transmit information to and/or receive information from each processor 404 respectively (e.g., cache memory resource 406- 1 associated with processor 404-1 may transmit information to and/or receive information from processor 404-1 , cache memory resource 406-2 associated with processor 404-2 may transmit information to and/or receive information from processor 404-2, etc.). In some non-limiting embodiments or aspects, each cache memory may be implemented by (e.g., part of) each processor 404 respectively.

58

SUBSTITUTE SHEET ( RULE 26) [0178] In some non-limiting embodiments or aspects, cache memory resource 406 may be referred to as independent-cache memory, as each cache memory resource 406 may only be accessible by processor 404 associated with cache memory resource 406 (e.g., cache memory resource 406-1 associated with processor 404-1 is an independent cache memory to processor 404-1 as it is only accessible by processor 404-1 , cache memory resource 406-2 associated with processor 404-2 is an independent cache memory to processor 404-2 as it is only accessible by processor 404-2, etc.). In this way, processor 404 may dynamically allocate cache memory resource 406 based on an address of cache memory resource pool 416 and a size of cache memory to allocate. Sizes of cache memory resources 406 may be different sizes or same sizes than other cache memory resources 406 based on a size included in the cache memory allocation request. This may provide for dynamic allocation of cache memory from a single cache memory resource pool that is accessible by all processors in the system, where each processor may only access a specific cache memory resource in the cache memory resource pool once the cache memory resource has been allocated.

[0179] Shared memory 408 may include RAM, read only memory (ROM), and/or another type of dynamic or static storage device (e.g., flash memory, magnetic memory, optical memory, etc.) that stores information and/or instructions for use by processor 404. In some non-limiting embodiments or aspects, shared memory 408 may include a computer readable medium that may store information and/or software related to the operation and use of processor 404. For example, shared memory 408 may include a hard disk (e.g., a magnetic disk, an optical disk, a magneto-optic disk, a solid-state disk, etc.) and/or another type of computer-readable medium. In some non-limiting embodiments or aspects, shared memory 408 may include memory that is mapped to one or more processes (e.g., processors executing software instructions, software applications, and/or the like) such that the one or more processors may read and/or write data to the memory without interacting with an operating system of a computing device (e.g., without calling functions of the operating system). In some non-limiting embodiments or aspects, shared memory 408 may transmit information to and/or receive information from processor 404. In some non-limiting embodiments or aspects, shared memory 408 may include a plurality of shared memory addresses. Each shared memory address may include an identifier of a shared memory location 59

SUBSTITUTE SHEET ( RULE 26) (e.g., a shared memory block) in shared memory 408. In some non-limiting embodiments or aspects, system 400 may include only one instance (e.g., one copy) of shared memory 408. Alternatively, system 400 may include a plurality of instances of shared memory 408, where each instance of shared memory 408 is identical (e.g., multiple copies of shared memory 408 where each instance is a mirrored copy of shared memory 408). Shared memory 408 may not include a cache memory (e.g., a separate and/or integrated cache memory) associated with shared memory 408. In some non-limiting embodiments or aspects, shared memory 408 without a cache memory associated with shared memory 408 may be referred to as cacheless shared memory.

[0180] As shown in FIG. 4 and as described herein, in order to execute tasks on processor 404 in system 400, the resources required may include access to three pointers that point to shared memory: (1 ) a first pointer to an address for queue of shared memory allocation requests 410, (2) a second pointer to queue of cache memory allocation requests 412, and (3) a third pointer to a plurality of job and/or task queues for processor 404, depending on a specific task assigned to be performed by processor 404. Additionally, shared memory 408 may include shared memory blocks storing one or more shared pointers or parameters used for memory allocation (e.g., shared memory allocation and/or cache memory allocation). In this way, multiple processors 404 may each execute separate tasks concurrently using allocated memory in both shared memory 408 and cache memory resource pool 416.

[0181] In some non-limiting embodiments, cache memory resource 406 may include a resource that contains static data (e.g., a program), where cache memory resource 406 is a common static cache memory resource that is accessible by multiple processors 404 that have access to a shared static pointer in shared memory 408. For example, cache memory resource 406 may include data that cannot be changed (e.g., written) by processors 404 and the data may be accessed by processors 404 (e.g., multiple processors 404) that have access to a shared static pointer in shared memory 408. In this way, multiple (e.g., one or more) processors 404 may access and/or share static data residing in cache memory resource 406 (e.g., a single cache memory resource 406) to facilitate concurrent reading of static data and/or concurrent execution of program instructions. In some non-limiting embodiments or aspects, queue of cache memory allocation requests 412 may request a pointer to a common

60

SUBSTITUTE SHEET ( RULE 26) static cache memory resource. In some-nonlimiting embodiments or aspects, cache memory resource 406 (e.g., one cache memory resource 406 such as cache memory resource 406-1 ) may include a resource that contains static data (e.g., a program) where cache memory resource 406 is a common static cache memory resource that is accessible by multiple processors 404 that have access to a static shared pointer in shared memory 408 (e.g., a pointer requested in queue of cache memory allocation requests 412). In this way, multiple processors 404 may access the static data in cache memory 406 by referencing the static shared pointer in shared memory 408.

[0182] Referring now to FIG. 5, shown is a diagram of a non-limiting embodiment or aspect of an exemplary environment 500 in which systems, products, and/or methods, as described herein, may be implemented. As shown in FIG. 5, environment 500 may include independent-cache collaboration system 502, computing device 504, client device 506, server 508, and communication network 510. In some non-limiting embodiments or aspects, each of computing device 504, client device 506, server 508, and/or communication network 510 may be implemented by (e.g., part of) independent-cache collaboration system 502. In some non-limiting embodiments or aspects, at least one of each of computing device 504, client device 506, server 508, and/or communication network 510 may be implemented by (e.g., part of) another system, another device, another group of systems, or another group of devices, separate from or including independent-cache collaboration system 502, such as computing device 504, client device 506, server 508, and/or the like.

[0183] Independent-cache collaboration system 502 may include one or more devices capable of receiving information from and/or communicating information to computing device 504, client devices 506, and/or server 508 via communication network 510. For example, independent-cache collaboration system 502 may include a computing device, such as a server, a group of servers, and/or other like devices. In some non-limiting embodiments or aspects, independent-cache collaboration system 502 may be associated with a server as described herein. In some non-limiting embodiments or aspects, independent-cache collaboration system 502 may be in communication with a data storage device (e.g., database, memory, shared memory, cache memory, and/or the like), which may be local or remote to independent-cache collaboration system 502. In some non-limiting embodiments or aspects, independent-cache collaboration system 502 may be capable of receiving information 61

SUBSTITUTE SHEET ( RULE 26) from, storing information in, communicating information to, or searching information stored in the data storage device.

[0184] Computing device 504 may include one or more devices capable of receiving information and/or communicating information to independent-cache collaboration system 502, client device 506, and/or server 508 via communication network 510. For example, computing device 504 may include a computing device, such as a server, a group of servers, and/or other like devices. In some non-limiting embodiments or aspects, computing device 504 may be associated with a server, a client device, and/or a user device as described herein.

[0185] Client device 506 may include one or more devices capable of receiving information from and/or communicating information to independent-cache collaboration system 502, computing device 504, and/or server 508 via communication network 510. Additionally or alternatively, one or more client devices 506 may include a device capable of receiving information from and/or communicating information to other client devices 506 via communication network 510, another network (e.g., an ad hoc network, a local network, a private network, a virtual private network, and/or the like), and/or any other suitable communication technique. For example, client device 506 may include a user device and/or the like.

[0186] Communication network 510 may include one or more wired and/or wireless networks. For example, communication network 510 may include a cellular network (e.g., a long-term evolution (LTE®) network, a third generation (3G) network, a fourth generation (4G) network, a fifth generation (5G) network, a code division multiple access (CDMA) network, and/or the like), a public land mobile network (PLMN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a telephone network (e.g., the public switched telephone network (PSTN)), a private network (e.g., a private network associated with independent-cache collaboration system 502), an ad hoc network, an intranet, the Internet, a fiber optic-based network, a cloud computing network, and/or the like, and/or a combination of these or other types of networks.

[0187] The number and arrangement of systems, devices, and/or networks shown in FIG. 5 are provided as an example. There may be additional systems, devices, and/or networks; fewer systems, devices, and/or networks; different systems, devices, and/or networks; and/or differently arranged systems, devices, and/or networks than 62

SUBSTITUTE SHEET ( RULE 26) those shown in FIG. 5. Furthermore, two or more systems or devices shown in FIG. 5 may be implemented within a single system or device, or a single system or device shown in FIG. 5 may be implemented as multiple, distributed systems or devices. Additionally or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of environment 500 may perform one or more functions described as being performed by another set of systems or another set of devices of environment 500.

[0188] In some non-limiting embodiments or aspects, a hardware cache coherence protocol may refer to a method and/or protocol that may be implemented in hardware components of a computing device such that components of the computing device may maintain a coherent and/or a consistent view of data (e.g., a consistent definition, a consistent value of a data resource) that is transmitted between components and read and/or written by components of the computing device (e.g., shared by components of the computing device). In some non-limiting embodiments or aspects, maintaining a coherent and/or a consistent view of data that is shared by components of a computing device may include tracking the status of the data that is shared. In some non-limiting embodiments or aspects, a hardware cache coherence protocol may be based on a directory in memory, snooping (e.g., detecting addresses that are passed via a bus) on a bus, and/or the like.

[0189] Referring now to FIG. 6, shown is a diagram of example components of a device 600 according to non-limiting embodiments or aspects. Device 600 (and/or at least one component of device 600) may correspond to at least one of independentcache collaboration system 102, processor 104, cache memory 106, and/or shared memory 108 in FIG. 1 and/or at least one of independent-cache collaboration system 502, computing device 504, client device 506, and/or server 508 in FIG. 5, as an example. In some non-limiting embodiments or aspects, such systems or devices in FIG. 1 or FIG. 5 may include at least one device 600 and/or at least one component of device 600. In some non-limiting embodiments or aspects, device 600 may not implement a hardware cache coherence protocol. The number and arrangement of components shown in FIG. 6 are provided as an example. In some non-limiting embodiments or aspects, device 600 may include additional components, fewer components, different components, or differently arranged components than those shown in FIG. 6. Additionally, or alternatively, a set of components (e.g., one or more 63

SUBSTITUTE SHEET ( RULE 26) components) of device 600 may perform one or more functions described as being performed by another set of components of device 600.

[0190] As shown in FIG. 6, device 600 may include bus 602, processor 604, memory 606, storage component 608, input component 610, output component 612, and communication interface 614. Bus 602 may include a component that permits communication among the components of device 600. In some non-limiting embodiments or aspects, bus 602 may not implement a hardware cache coherence protocol. In some non-limiting embodiments or aspects, processor 604 may be implemented in hardware, software (e.g., firmware), or a combination of hardware and software. For example, processor 604 may include a processor (e.g., a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), etc.), a microprocessor, a digital signal processor (DSP), and/or any processing component (e.g., a field-programmable gate array (FPGA), an applicationspecific integrated circuit (ASIC), etc.) that can be programmed to perform a function. In some non-limiting embodiments or aspects, processor 604 may be the same as or similar to processor 104. Memory 606 may include RAM, ROM, and/or another type of dynamic or static storage device (e.g., flash memory, magnetic memory, optical memory, etc.) that stores information and/or instructions for use by processor 604. In some non-limiting embodiments or aspects, memory 606 may be the same as or similar to cache memory 106 and/or shared memory 108.

[0191] With continued reference to FIG. 6, storage component 608 may store information and/or software related to the operation and use of device 600. For example, storage component 608 may include a hard disk (e.g., a magnetic disk, an optical disk, a magneto-optic disk, a solid state disk, etc.) and/or another type of computer-readable medium. In some non-limiting embodiments or aspects, storage component 608 may be the same as or similar to cache memory 106 and/or shared memory 108. Input component 610 may include a component that permits device 600 to receive information, such as via user input (e.g., a touch screen display, a keyboard, a keypad, a mouse, a button, a switch, a microphone, etc.). Additionally, or alternatively, input component 610 may include a sensor for sensing information (e.g., a global positioning system (GPS) component, an accelerometer, a gyroscope, an actuator, etc.). Output component 612 may include a component that provides output information from device 600 (e.g., a display, a speaker, one or more light-emitting 64

SUBSTITUTE SHEET ( RULE 26) diodes (LEDs), etc.). Communication interface 614 may include a transceiver-like component (e.g., a transceiver, a receiver, a transmitter, a separate receiver and transmitter pair, etc.) that enables device 600 to communicate with other devices, such as via a wired connection, a wireless connection, or a combination of wired and wireless connections. Communication interface 614 may permit device 600 to receive information from another device and/or provide information to another device. For example, communication interface 614 may include an Ethernet interface, an optical interface, a coaxial interface, an infrared interface, a radio frequency (RF) interface, a universal serial bus (USB) interface, a Wi-Fi® interface, a cellular network interface, and/or the like.

[0192] Device 600 may perform one or more processes described herein. Device 600 may perform these processes based on processor 604 executing software instructions stored by a computer-readable medium, such as memory 606 and/or storage component 608. A computer-readable medium may include any non- transitory memory device. A memory device includes memory space located inside of a single physical storage device or memory space spread across multiple physical storage devices. Software instructions may be read into memory 606 and/or storage component 608 from another computer-readable medium, or from another device via communication interface 614. When executed, software instructions stored in memory 606 and/or storage component 608 may cause processor 604 to perform one or more processes described herein. Additionally, or alternatively, hardwired circuitry may be used in combination with software instructions to perform one or more processes described herein. Thus, embodiments described herein are not limited to any specific combination of hardware circuitry and software. The term “programmed or configured,” as used herein, refers to an arrangement of software, hardware circuitry, or any combination thereof on one or more devices.

[0193] Although embodiments have been described in detail for the purpose of illustration, it is to be understood that such detail is solely for that purpose and that the disclosure is not limited to the disclosed embodiments or aspects, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present disclosure contemplates that, to the extent possible, one or more features of

65

SUBSTITUTE SHEET ( RULE 26) any embodiment or aspect can be combined with one or more features of any other embodiment or aspect.

66

SUBSTITUTE SHEET ( RULE 26)