Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CONFLICT PREVENTION IN CLUSTER SCHEDULING IN A DATA PROCESSING SYSTEM
Document Type and Number:
WIPO Patent Application WO/2020/162800
Kind Code:
A1
Abstract:
A system and method to allocate resources in a data processing system (800). In one embodiment, an apparatus (840, 1200), and related method (1000), is configured to receive a request for resource views from application schedulers (830) in response to a plurality of user requests to execute applications, categorize ones of the plurality of user requests into scheduling units, and provide the scheduling units to a resource manager (820) to enable the resource manager (820) to construct the resource views for the application schedulers (830).

Inventors:
SEDAGHAT MINA (SE)
SKÖLDSTRÖM PONTUS (SE)
Application Number:
PCT/SE2019/050094
Publication Date:
August 13, 2020
Filing Date:
February 05, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ERICSSON TELEFON AB L M (SE)
International Classes:
G06F9/48; G06F9/50
Foreign References:
US9697045B22017-07-04
US20190014059A12019-01-10
Other References:
HINDMAN, BENJAMIN ET AL.: "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center", NSDI, vol. 11, 2011
SCHWARZKOPF, MALTE ET AL.: "Proceedings of the 8th ACM European Conference on Computer Systems", 2013, ACM, article "Omega: Flexible, Scalable Schedulers for Large Compute Clusters"
HINDMAN, BENJAMIN ET AL.: "A Common Substrate for Cluster Computing", HOTCLOUD, 2009
Attorney, Agent or Firm:
SJÖBERG, Mats (SE)
Download PDF:
Claims:
IN THE CLAIMS:

1. An apparatus (840, 1200) in a data processing system (800), comprising: processing circuitry (1210), configured to:

receive a request for resource views from application schedulers (830) in response to a plurality of user requests to execute applications;

categorize ones of said plurality of user requests into scheduling units; and provide said scheduling units to a resource manager (820) to enable said resource manager (820) to construct said resource views for said application schedulers (830).

2. The apparatus (840, 1200) as recited in Claim 1 wherein each of said scheduling units corresponds to said plurality of user requests that share ones of said resource views.

3. The apparatus (840, 1200) as recited in Claim 1 wherein said processing circuitry (1210) is configured to categorize ones of said plurality of user requests into said scheduling units to reduce risk of conflict between said application schedulers (830).

4. The apparatus (840, 1200) as recited in Claim 1 wherein said processing circuitry (1210) is configured to provide said scheduling units to instruct said resource manager (820) to build a shared resource view wherein ones of said application schedulers (830) schedule resources in parallel on a full cluster with a low risk of conflict.

5. The apparatus (840, 1200) as recited in Claim 1 wherein said processing circuitry (1210) is configured to provide said scheduling units to instruct said resource manager (820) to build an isolated resource view wherein there is a risk of conflict between ones of said plurality of user requests from corresponding ones of said application schedulers (830), said corresponding ones of said application schedulers (830) running independently and sequentially on a full cluster.

6. The apparatus (840, 1200) as recited in Claim 1 wherein said processing circuitry (1210) is configured to provide said scheduling units to instruct said resource manager (820) to build an isolated resource sub-view wherein ones of said application schedulers (830) schedule resources that run on a partition of a cluster with a low risk of conflict.

7. The apparatus (840, 1200) as recited in Claim 1 wherein said processing circuitry (1210) is configured to provide said scheduling units to instruct said resource manager (820) to allocate resources in time and space to said application schedulers (830) as indicated in said resource views.

8. The apparatus (840, 1200) as recited in Claim 1 wherein said resource views allocate resources selected from the group consisting of:

servers;

processors;

memory;

bandwidth of communication systems;

transmit power levels; and

intervals of time to operate said resources.

9. The apparatus (840, 1200) as recited in Claim 1 wherein said resource manager (820) operates in a first level (825) and said application schedulers (830) operate in a second level (835).

10. The apparatus (840, 1200) as recited in Claim 9 wherein said apparatus (840, 1200) operates in a level accessible by said first level (825) and said second level (835).

11. A method (1000) of operating a data processing system (800), comprising: receiving (1020) a request for resource views from application schedulers (830) in response to a plurality of user requests to execute applications;

categorizing (1030) ones of said plurality of user requests into scheduling units; and

providing (1040) said scheduling units to a resource manager (820) to enable said resource manager (820) to construct said resource views for said application schedulers (830).

12. The method (1000) as recited in Claim 11 wherein each of said scheduling units corresponds to said plurality of user requests that share ones of said resource views.

13. The method (1000) as recited in Claim 11 wherein said categorizing (1030) comprises categorizing said ones of said plurality of user requests into said scheduling units to reduce risk of conflict between said application schedulers (830).

14. The method (1000) as recited in Claim 11 wherein said providing (1040) comprises instructing said resource manager (820) to build a shared resource view wherein ones of said application schedulers (830) schedule resources in parallel on a full cluster with a low risk of conflict.

15. The method (1000) as recited in Claim 11 wherein said providing (1040) comprises instructing said resource manager (820) to build an isolated resource view wherein there is a risk of conflict between ones of said plurality of user requests from corresponding ones of said application schedulers (830), said corresponding ones of said application schedulers (830) running independently and sequentially on a full cluster.

16. The method (1000) as recited in Claim 11 wherein said providing (1040) comprises instructing said resource manager (820) to build an isolated resource sub-view wherein ones of said application schedulers (830) schedule resources that run on a partition of a cluster with a low risk of conflict.

17. The method (1000) as recited in Claim 11 wherein said providing (1040) comprises instructing said resource manager (820) to allocate resources in time and space to said application schedulers (830) as indicated in said resource views.

18. The method (1000) as recited in Claim 11 wherein said resource views allocate resources selected from the group consisting of:

servers;

processors;

memory;

bandwidth of communication systems;

transmit power levels; and

intervals of time to operate said resources.

19. The method (1000) as recited in Claim 11 wherein said resource manager (820) operates in a first level (825) and said application schedulers (830) operate in a second level (835).

20. The method (1000) as recited in Claim 19 wherein said method (1000) operates in a level accessible by said first level (825) and said second level (835).

21. A scheduler (810, 1200) in a data processing system (800), comprising: application schedulers (830) configured to receive a plurality of user requests to execute applications;

an arbiter (840) configured to receive a request for resource views from said application schedulers (830) in response to said plurality of user requests, and categorize ones of said plurality of user requests into scheduling units; and

a resource manager (820) configured to receive said scheduling units, and construct said resource views for said application schedulers (830).

22. The scheduler (810, 1200) as recited in Claim 21 wherein each of said scheduling units corresponds to said plurality of user requests that share ones of said resource views.

23. The scheduler (810, 1200) as recited in Claim 21 wherein said arbiter (840) is configured to categorize said ones of said plurality of user requests into said scheduling units to reduce risk of conflict between said application schedulers (830).

24. The scheduler (810, 1200) as recited in Claim 21 wherein said resource manager (820) is configured to build a shared resource view wherein ones of said application schedulers (830) schedule resources in parallel on a full cluster with a low risk of conflict in accordance with said scheduling units.

25. The scheduler (810, 1200) as recited in Claim 21 wherein said resource manager (820) is configured to build an isolated resource view wherein there is a risk of conflict between ones of said plurality of user requests from corresponding ones of said application schedulers (830) in accordance with said scheduling units, said corresponding ones of said application schedulers (830) running independently and sequentially on a full cluster.

26. The scheduler (810, 1200) as recited in Claim 21 wherein resource manager (820) is configured to build an isolated resource sub-view wherein ones of said application schedulers (830) schedule resources that run on a partition of a cluster with a low risk of conflict in accordance with said scheduling units.

27. The scheduler (810, 1200) as recited in Claim 21 wherein resource manager (820) is configured to allocate resources in time and space to said application schedulers (830) as indicated in said resource views in accordance with said scheduling units.

28. The scheduler (810, 1200) as recited in Claim 21 wherein said resource views allocate resources selected from the group consisting of:

servers;

processors;

memory;

bandwidth of communication systems;

transmit power levels; and

intervals of time to operate said resources.

29. The scheduler (810, 1200) as recited in Claim 21 wherein said resource manager (820) operates in a first level (825) and said application schedulers (830) operate in a second level (835).

30. The scheduler (810, 1200) as recited in Claim 29 wherein said arbiter (840) operates in a level accessible by said first level (825) and said second level (835).

31. A method (1100) of operating a data processing system (800), comprising: receiving (1120) a plurality of user requests to execute applications;

categorizing (1130) ones of said plurality of user requests into scheduling units in response to said plurality of user requests; and

constructing (1140) resource views in response to said scheduling units.

32. The method (1100) as recited in Claim 31 wherein each of said scheduling units corresponds to said plurality of user requests that share ones of said resource views.

33. The method (1100) as recited in Claim 31 wherein said categorizing (1130) comprises categorizing said ones of said plurality of user requests into said scheduling units to reduce risk of conflict.

34. The method (1100) as recited in Claim 31 wherein said constructing (1140) comprises building a shared resource view wherein resources are scheduled in parallel on a full cluster with a low risk of conflict.

35. The method (1100) as recited in Claim 31 wherein said constructing

(1140) comprises building an isolated resource view wherein there is a risk of conflict between ones of said plurality of user requests from corresponding ones of application schedulers (830), said corresponding ones of said application schedulers (830) running independently and sequentially on a full cluster.

36. The method (1100) as recited in Claim 31 wherein said constructing (1140) comprises building an isolated resource sub-view wherein ones of application schedulers (830) schedule resources that run on a partition of a cluster with a low risk of conflict in accordance with said scheduling units.

37. The method (1100) as recited in Claim 31 further comprising allocating (1150) resources in time and space as indicated in said resource views in accordance with said scheduling units.

38. The method (1100) as recited in Claim 31 wherein said resource views allocate resources selected from the group consisting of:

servers;

processors;

memory;

bandwidth of communication systems;

transmit power levels; and

intervals of time to operate said resources.

39. The method (1100) as recited in Claim 31 wherein said method (1100) is performed by a two-level scheduler (810).

40. The method (1100) as recited in Claim 31 wherein said receiving (1120) is performed at a first level (825) of said two-level scheduler (810), said constructing (1140) is performed at a second level (835) of said two-level scheduler (810), and said categorizing (1130) is performed at a level accessible by said first level (825) and said second level (835).

Description:
CONFLICT PREVENTION IN CLUSTER SCHEDULING

IN A DATA PROCESSING SYSTEM

TECHNICAL FIELD

The present disclosure is directed, in general, to the field of cloud computing and, more specifically, to a system and method to allocate resources in a data processing system.

BACKGROUND

Current cloud platforms support diverse workloads and application frameworks such as, without limitation, the Hadoop operating system and message passing interface (“MPI”). Each application framework has different scheduling needs based on its programming model, communication patterns, task dependencies, and data placement.

Two-level schedulers such as those described by Hindman, Benjamin, et al ., in an article entitled“Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,” NSDI Vol. 11, 2011 (also referred to as the“Mesos article”), and by

Schwarzkopf, Malte, et al., in an article entitled“Omega: Flexible, Scalable Schedulers for Large Compute Clusters,” Proceedings of the 8th ACM European Conference on Computer Systems, ACM, 2013 (also referred to as the“Omega article”), are scheduling platforms for sharing and managing cluster resources between multiple application frameworks. They are introduced to support and simplify sophisticated scheduling requirements of different applications. The general idea is that no single (scheduling) framework is optimal for all types of applications, and therefore organizations should instead be enabled to run a plurality of application schedulers efficiently in the same cloud computing environment, as described in an article by Hindman, Benjamin, el a/., entitled“A Common Substrate for Cluster Computing,” in HotCloud. 2009. The aforementioned articles are incorporated herein by reference.

In two-level schedulers, a resource manager offers a coarse view of available resources in a cluster to a group of distributed application schedulers. This view may not be up-to-date as a plurality of application schedulers compete for resources and may claim the same resource concurrently. Overlapping claims for the same resources can produce scheduling conflicts.

It is highly desirable, therefore, to efficiently allocate resources by a plurality of application schedulers in a data processing system. A resource allocation process that addresses the aforementioned issues can enhance the efficiency with which resources are allocated in a data processing system including a data center, or the like.

SUMMARY

These and other problems are generally solved or circumvented, and technical advantages are generally achieved, by advantageous embodiments of the present disclosure for a system and method to allocate resources in a data processing system. In one embodiment, an apparatus, and related method, is configured to receive a request for resource views from application schedulers in response to a plurality of user requests to execute applications, categorize ones of the plurality of user requests into scheduling units, and provide the scheduling units to a resource manager to enable the resource manager to construct the resource views for the application schedulers.

In another embodiment, a scheduler in a data processing system includes application schedulers, an arbiter and a resource manager, and a method of operating the same. The application schedulers are configured to receive a plurality of user requests to execute applications. The arbiter is configured to receive a request for resource views from the application schedulers in response to the plurality of user requests, and categorize ones of the plurality of user requests into scheduling units. The resource manager is configured to receive the scheduling units, and construct the resource views for the application schedulers.

The foregoing has outlined rather broadly the features and technical advantages of the present disclosure in order that the detailed description of the disclosure that follows may be better understood. Additional features and advantages of the disclosure will be described hereinafter, which form the subject of the claims of the disclosure. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures or processes for carrying out the same purposes of the present disclosure. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the disclosure as set forth in the appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present disclosure, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

FIGURE 1 illustrates a block diagram of an embodiment of a two-level scheduler; FIGURE 2 illustrates a block diagram of an embodiment of a two-level scheduler employing an optimistic approach;

FIGURES 3 to 5 illustrate block diagrams of embodiments of a two-level scheduler employing a pessimistic approach;

FIGURES 6 and 7 illustrate block diagrams of embodiments of a two-level scheduler;

FIGURE 8 illustrates a block diagram of an embodiment of a data processing system including a two-level scheduler demonstrating an operation thereof;

FIGURE 9 illustrates a block diagram of an embodiment of an arbiter;

FIGURES 10 and 11 illustrate flow diagrams of embodiments of a method of operating a data processing system; and

FIGURE 12 illustrates a block diagram of an embodiment of a communication node in a data processing system.

Corresponding numerals and symbols in the different figures generally refer to corresponding parts unless otherwise indicated, and may not be redescribed in the interest of brevity after the first instance. The FIGURES are drawn to illustrate the relevant aspects of exemplary embodiments.

DETAILED DESCRIPTION

The making and using of the present exemplary embodiments are discussed in detail below. It should be appreciated, however, that the embodiments provide many applicable inventive concepts that can be embodied in a wide variety of specific contexts. The specific embodiments discussed are merely illustrative of specific ways to make and use the systems, subsystems, and modules associated with a system and method to allocate resources in a data processing system including a data center.

A system will be described herein with respect to exemplary embodiments in a specific context, namely, a scheduler that allocate resources in a data processing system. While the principles will be described in the environment of a cloud data processing system, any environment such a local data processing system that may benefit from such a system and method that enables these functionalities is well within the broad scope of the present disclosure. As contemplated herein, resources allocable by a scheduler include, without limitation, servers, microprocessors and microprocessor cores and memory, bandwidth of communication systems, servers and portions thereof, transmitter power levels and associated receivers, and intervals of time over which such elements are enabled to operate. A two-level scheduler is a scheduling platform for sharing and managing cluster resources between multiple application frameworks. A two-level scheduler includes a resource manager (level one) and a set of application schedulers running in parallel (level two). In this design, an application scheduler upon each application arrival requests a resource view of available resources from the resource manager. The resource manager decides which resources to offer to each application, and the application scheduler decides which resources to accept and allocate, and finally, which tasks to execute on top of the allocated resources. Each application scheduler is calibrated and designed to support the needs of a particular application framework. No single scheduling framework will be optimal for all types of applications, and therefore organizations should be enabled to run multiple application schedulers efficiently in the same cloud computing environment.

Turning now to FIGURE 1, illustrated is a block diagram of an embodiment of a two-level scheduler 100. The two-level scheduler 100 is formed with a resource manager (level one) 110 and a set of application schedulers (“AS”, collectively designated 120) running in parallel (level two). In this design, the application scheduler 120, upon each application arrival, requests a resource view on available resources from the resource manager 110. The resource manager 110 decides which resources

(collectively designated 130) to offer to each application, and the application scheduler 120 decides which resources 130 to accept and allocate, and finally which tasks to execute on top of the allocated resources 130. The application scheduler 120 is calibrated and designed to support the needs of a particular application framework.

To prevent or resolve conflicts produced by two-level schedulers, a plurality of solutions can be employed. In a“pessimistic approach,” conflicts are avoided by ensuring that a resource is only offered to one application scheduler at a time. (See, e.g ., the“Mesos article” referenced above.) This is achieved by either partitioning the full set of resources into independent resource views that are offered to different application schedulers or by offering all the available cluster resources to a single application scheduler at a time. Since only a single application scheduler is examining a particular set of resources (either the full set or a partition of it) at any point in time, it effectively holds a lock on those resources for the duration of a scheduling decision.

In the pessimistic approach, scheduling parallelism is reduced as resources are locked for each application scheduler, or else visibility of resources for each application scheduler is restricted. The restrictions in parallelism through locking lead to lower scheduling throughput and therefore longer total scheduling time. Offering a fraction of available resources leads to sub-optimal scheduling decisions for picky applications as not all available resources are considered when deciding what should be allocated. (See, e.g ., the“Omega article” referenced above.) Thus, the pessimistic approach avoids conflicts, but reduces scheduling parallelism and offers only a fraction of locked resources.

In an“optimistic approach,” lock-free consistency control is employed. This approach assumes that scheduling conflicts are rare, and that conflicting claims can be resolved in a safe and inexpensive manner. In this approach, a plurality of application schedulers are simultaneously given access to the full resource view and allocate resources independently and concurrently. An optimistic approach improves parallelism, but potentially increases the amount of wasted scheduling work if conflicts occur frequently. The above-mentioned assumptions are not necessarily true for all workloads. There is also a risk of starving an application in case of recurring conflicts, especially if one conflicting request is abandoned and scheduling is attempted repeatedly. Thus, the optimistic approach improves parallelism, but risks conflicts and increases the amount of wasted scheduling work if conflicts occur frequently.

As introduced herein, an arbiter classifies application schedulers into different groups referred to as scheduling units. The scheduling units correspond to a plurality of user requests that share resource views. The arbiter uses this classification to decide how to parallelize the application schedulers or how to share the cluster view among the application schedulers. The arbiter determines for what classes of user requests associated with application schedulers it is safe to concurrently operate on a full resource view, and for which classes of application schedulers the resource view should be either locked and offered sequentially and/or partitioned before being offered to either individual application schedulers or to groups of application schedulers.

The mechanism introduced herein retains a high level of parallelization offered by the optimistic approach (and thus provides higher scheduling throughput) by maintaining the idea of concurrently sharing a resource view. In addition, it reduces the risk of conflicts through a classification scheme that actively reduces or avoids conflicts. The arbiter classifies and controls how to share a cluster resource view, in both time and space, and between a plurality of application schedulers. The aim of the arbiter is to reduce the probability of conflicts, and ensure that the raised conflicts are not expensive, from a resource perspective or otherwise, to resolve. For instance, if conflicts are not resolved within an acceptable time and/or scheduling has to be repeated as a result thereof, extra overhead is employed to resolve the conflict.

The arbiter categorizes the application schedulers according to the following groups. First, the application schedulers that schedule resources in parallel on a full cluster with a low risk of conflict as represented by a shared resource view. Thus, the application schedulers are allocated analogous resources in parallel because of the lower risk. Second, the application schedulers that should run independently and sequentially on a full cluster as represented by an isolated resource view wherein there is a risk of conflict between a plurality of user requests from corresponding application schedulers. Thus, each application scheduler is allocated dedicated full resource view sequentially due to factors such as a higher risk of conflict. Third, the application schedulers that schedule resources that can run on a partition of a cluster with a low risk of conflict as represented by an isolated resource sub-view.

The arbiter groups the application schedulers to run in parallel with a shared resource view if either the risk of conflict between the application schedulers is low, or the cost of conflict resolution is cheap, for example, in the case of conflict, the conflicting applications can be co-located without notable performance impact. This would be the case when the cost of performance degradation for the application is cheaper, from a resource perspective or otherwise, than the cost of the decision to roll back and re-schedule. These scenarios can occur when the applications to be scheduled are heterogeneous ( e.g ., short- versus long-lived applications), or have different resources requirements (e.g., central processing unit (“CPU”) -bound versus input/output (“I/O”) -bound), or they have different priorities (e.g, low versus high priority). Several examples are provided below.

The application schedulers responsible for scheduling short-lived applications (e.g., an in-memory Spark query) can potentially run in parallel with the application schedulers of long-lived applications with no critical service level agreement (“SLA”). In this case, even if these two applications are both scheduled on the same server simultaneously, the time that two applications are co-located is in a scale of 0.1 to I second and, consequently, the performance impact is momentary and negligible. Thus, the conflict can safely be ignored.

The application schedulers responsible for scheduling low-priority applications can be grouped with schedulers of high-priority applications to be run in parallel. In the case of conflict, the cost of aborting scheduling decisions for the conflicting low priority applications is negligible and would not impact high-priority operations. The application schedulers that are responsible for scheduling applications with different resource demands or demand patterns can potentially schedule together. In this case, the applications do not compete over the same resource type, so it is less likely that they choose the same server. For example, an application scheduler scheduling a CPU- bound application (with no critical SLA) can run in parallel with an application scheduler scheduling an I/O-bound process working on a shared-resource view. In case of possible conflicts ( e.g ., the server can be overcommitted), the applications can be collocated and without drastic performance degradation, since the performance of the different applications is weakly correlated due to using different resources.

The arbiter can determine if an application requires a full resource view for an improved scheduling decision or offering a sub-view would be enough for the scheduling without sacrificing the quality of the decision. For example, picky applications (an application with many constraints), often require access to the full state of the cluster as described in the“Omega article” referenced above, whereas applications schedulers with soft constraints can schedule their applications using only a fraction of available resources. The application schedulers, which schedule applications with specific requirements, (e.g., the applications require hardware features such as remote directory memory access (“RDMA”) or field programmable gate array (“FPGA”)), can work on an isolated sub view, independent from the rest of the cluster. Having a plurality of application schedulers, each running on an isolated sub-view of resources can increase parallelization and therefore higher scheduling throughput.

The arbiter may strictly isolate an application scheduler and run it with a locked resource view if there is a high risk of conflict, and the cost of conflict resolution is high. An example includes if the application scheduler is scheduling a relatively large application compared to the offered view. In this case, it is highly probable that most of resources in the resource offer are accepted and allocated, and therefore the risk of conflict is high. Another example is if the application has a strict short scheduling deadline and the possible conflict would lead to a violation of the scheduling deadline. Another example includes if the application is large and the risk of repeated conflicts would increase the chance of starvation.

An application is often described with multiple features. For example, an application can be long-lived, while at the same time have critical SLAs and therefore high priority. Some of these features are usually correlated (e.g., applications with critical SLAs are often high-priority as well). The arbiter weighs the features against each other, to devise a decision. In other words, the different properties of each feature are correlated to devise the decision.

The arbiter profiles the application schedulers, using a rough estimate on following example features of their associated applications, as illustrated in Table 1 below.

Table 1 : Application Features

Thus, as described hereinabove, an arbiter uses profiles to group user requests into scheduling units. Each scheduling unit is formed with a single or group of application schedulers that can run in parallel using a shared resource view. If a scheduling unit includes only one application scheduler, this application scheduler runs in isolation, and in sequence with other scheduling units. The arbiter also decides and controls how to share the resource view between the application schedulers within a scheduling unit (z.e., if a view needs to be sub-divided). The arbiter also iteratively collects, analyzes, and identifies conflicts. The analysis uses this information to update the application scheduler’s profile and makes better decisions over time. Thus, the scheduling units correspond to a plurality of user requests that share resource views. The arbiter categorizes the plurality of user requests into the scheduling units to reduce risk of conflict between the application schedulers.

Turning now to FIGURE 2, illustrated is a block diagram of an embodiment of a two-level scheduler 200 employing an optimistic approach. The two-level scheduler 200 is employing the optimistic approach with high parallelization, but with a chance of conflict. A resource manager 210 operating in level one 220 provides a full resource view employing an optimistic approach with parallel scheduling by application schedulers (“AS”) running in level two 230. As illustrated, all the application schedulers (designated AS 1, 2, 3, 4, 5) run in parallel and produce a conflict in resource allocation between application scheduler 4 and application scheduler 5, which is a likely result of employing an optimistic approach. This arrangement assumes that scheduling conflicts are rare, and conflicting claims can be resolved in a safe and cheap manner.

Turning now to FIGURE 3, illustrated is a block diagram of an embodiment of a two-level scheduler 300 employing a pessimistic approach. As illustrated, a resource manager 310 operating in level one 320 provides a full resource view employing a pessimistic approach with space partitioning. The resource manager 310 provides sub views (designated SV 1, 2, 3, 4, 5) to corresponding application schedulers (designated AS 1, 2, 3, 4, 5) operating in level two 330. This arrangement assumes that scheduling conflicts are high, and conflicting claims cannot be resolved in a safe and cheap manner. In this example, each framework application scheduler is restricted to a sub-view of resources. This approach leads to sub-optimal decisions because of the limited view of resources, but no conflicts.

Turning now to FIGURE 4, illustrated is a block diagram of an embodiment of a two-level scheduler 400 employing a pessimistic approach. In this case, the applications are scheduled sequentially by a resource manager 410 operating level one 420, by the sequential arrangement of application schedulers (designated AS 1, 2, 3, 4, 5) operating in level two 430. In other words, the resource manager 410 provides a full resource view and employs a pessimistic approach with sequentially scheduling.

Turning now to FIGURE 5, illustrated is a block diagram of an embodiment of a two-level scheduler 500 employing a pessimistic approach. In this case, a resource manager 510 operating in level one 520 offers a coarse, but full view of available resources in the cluster to a group of distributed application schedulers operating in level two 530 with time partitioning for the application schedulers (designated AS 1, 2, 3). Applications are again scheduled temporally sequentially by the resource manager 510 such as indicated in FIGURE 5 by the temporally sequential arrangement of application schedulers 1, 2, 3. However, this view may not necessarily always be up-to-date as multiple application schedulers are competing for resources and may claim the same resource concurrently which can result in scheduling conflicts. Sequential scheduling leads to lower scheduling throughput and delayed scheduling decisions, and a less than optimal arrangement that results from scheduling resources sequentially.

Turning now to FIGURE 6, illustrated is a block diagram of an embodiment of a two-level scheduler 600. The two-level scheduler 600 includes an arbiter 610 that instructs a resource manager 620 operating in level one 630, which provides resource views for application schedulers (designated AS 1, 2, 3, 4, 5) operating in level two 640. In applications with a low risk of conflict, resources are scheduled in parallel by the arbiter 610. High risk application schedulers operating in level two 640 get a full resource view for improved ( e.g ., optimal) decisions. As illustrated in FIGURE 6, the arbiter 610 selectively groups a portion of the application schedulers (i.e., AS 2, 3, 4), to share the resource view in a common interval of time. Additionally, the arbiter 610 also schedules applications temporally sequentially as indicated by the temporally sequential arrangement of application scheduler 1, then the group of applications schedulers 2, 3, 4, followed by the application scheduler 5.

Turning now to FIGURE 7, illustrated is a block diagram of an embodiment of a two-level scheduler 700. The two-level scheduler 700 includes an arbiter 710 that instructs a resource manager 720 operating in level one 730, which provides resource views for application schedulers (designated AS 1, 2, 3, 4, 5) operating in level two 740. Again, in applications with a low risk of conflict, resources are scheduled in parallel by the arbiter 610. High risk application schedulers operating in level two 640 get a full resource view for improved (e.g., optimal) decisions. As illustrated in FIGURE 7, the arbiter 610 selectively groups a portion of the application schedulers (i.e., AS 1, 5), to share the resource view in a common space with respect to a sub-view (“SV”) 750. Additionally, the arbiter 610 also schedules applications in parallel as indicated by the parallel arrangement of application schedulers 2, 3, 4 with respect to a sub-view 760.

With continuing reference to FIGURES 6 and 7, the arbiters 610, 710 determine for what classes of application schedulers it is safe to concurrently operate on the full resource view, and for which classes of application schedulers the resource view should be either locked and offered sequentially and/or partitioned before being offered to either individual schedulers or groups of schedulers. The arbiters 610, 710 classify and control how to share the cluster resource view, in both time and space, and between multiple application schedulers. The arbiters 610, 710 use this classification to decide how to parallelize schedulers or how to share the cluster view among the application schedulers. The process retains parallelism and reduces the risk of conflict. Turning now to FIGURE 8, illustrated is a block diagram of an embodiment of a data processing system 800 including a two-level scheduler 810 operable on server(s) (generally designated 850) demonstrating an operation thereof. The servers 850 may form at least a portion of a data center. The two-level scheduler 810 includes a resource manager 820 at level one 825, a plurality of application schedulers (generally designated 830) at level two 835, and an arbiter 840 accessible by the resource manager 820 at level one 825 and the application schedulers 830 at level two 835. The application schedulers 835 include, without limitation, a Dryad scheduler, a Hadoop scheduler, a service scheduler and a batch scheduler.

A plurality of users request resources (a user request) via a client (designated CT 1, 2, 3, 4 and generally designated 870) through a communication network 860 to execute applications from respective application schedulers 830. The application schedulers 830 request the resources represented by a resource view from the arbiter 840. The arbiter 840 categorizes ones of the user requests into scheduling units and provides the scheduling units to the resource manager 820. The resource manager 820 offers the resource views based on recommendations of the arbiter 840. The resource manager 820 shares the resource view among the application schedulers 830 grouped in a scheduling unit. Thus, the arbiter 840 classifies application schedulers 830 into different groups based on scheduling units as set forth above.

Turning now to FIGURE 9, illustrated is a block diagram of an embodiment of an arbiter 910. The arbiter 910 receives user requests (designated SI, S2, S3, S4, S5, S6) asking for resource views from a queue 920. The arbiter 860 receives the user requests from the application schedulers and retrieves generic profiles associated with the application schedulers from a profile database 930 that provides user parameters and constraints. Using the user requests and generic profiles, arbiter 860 groups the user requests and respective application schedulers into a plurality of scheduling units. Each scheduling unit specifies which application schedulers can run in parallel, and which schedulers should run in isolation or on an isolated sub-view. The arbiter 860 produces an updated scheduling queue 940, where the user requests are grouped into the scheduling units. As illustrated in the example shown in FIGURE 9, the updated scheduling queue 940 is formed with scheduling user request SI assigned to one queue, scheduling user requests S2, S3, S4 assigned to a second queue, and scheduling user requests S5, S6 assigned to a third queue. Tuming now to FIGURE 10, illustrated is a flow diagram of an embodiment of a method 1000 of operating a data processing system. The method 1000 is operable in an apparatus (including, for instance, an arbiter 840 of the scheduler 810 of FIGURE 8) of the data processing system ( e.g ., including a data center) and begins at a start step or module 1010. At a step or module 1020, the apparatus receives a request for resource views from application schedulers in response to a plurality of user requests to execute applications. The apparatus categorizes ones of the plurality of user requests into scheduling units, at a step or module 1030 to, for instance, reduce the risk of conflict between the application schedulers. Each of the scheduling units corresponds to the plurality of user requests that share ones of the resource views.

At a step or module 1040, the apparatus provides the scheduling units to a resource manager to enable the resource manager to construct the resource views for the application schedulers. In accordance with providing the scheduling units, the apparatus may instruct the resource manager to build a shared resource view wherein ones of the application schedulers schedule resources in parallel on a full cluster with a low risk of conflict. In another embodiment in accordance with providing the scheduling units, the apparatus may instruct the resource manager to build an isolated resource view wherein there is a risk of conflict between ones of the plurality of user requests from

corresponding ones of the application schedulers, the corresponding ones of the application schedulers running independently and sequentially on a full cluster. In another embodiment in accordance with providing the scheduling units, the apparatus may instruct the resource manager to build an isolated resource sub-view wherein ones of the application schedulers schedule resources that run on a partition of a cluster with a low risk of conflict. Also, the apparatus may instruct the resource manager to allocate resources in time and space to the application schedulers as indicated in the resource views and in accordance with providing the scheduling units.

The resource views may allocate resources including, without limitation, servers, processors, memory, bandwidth of communication systems, transmit power levels; and intervals of time to operate the resources. The apparatus may operate within a two-level scheduler wherein the resource manager operates in a first level, the application schedulers operate in a second level, and the arbiter operates in a level accessible by the first level and the second level. The method 1000 ends at a step or module 1050.

Turning now to FIGURE 11, illustrated is a flow diagram of an embodiment of a method 1100 of operating a data processing system. The method 1100 is operable in an apparatus (including, for instance, a scheduler 810 of FIGURE 8) of the data processing system ( e.g ., including a data center) and begins at a start step or module 1110. At a step or module 1120, the apparatus receives a plurality of user requests to execute applications (for instance, at an application scheduler 830 of the scheduler 810 of FIGURE 8). At a step or module 1130, the apparatus categorizes ones of the plurality of user requests into scheduling units in response to the plurality of user requests (for example, by an arbiter 840 of the scheduler 810 of FIGURE 8) to, for instance, reduce the risk of conflict (e.g., between the application schedulers 830). Each of the scheduling units corresponds to the plurality of user requests that share ones of the resource views.

At a step or module 1140, the apparatus constructs resource views (e.g, for the application schedulers 830) in response to the scheduling units (for instance, by a resource manager 820 of the scheduler 810 of FIGURE 8). In accordance with constructing the resource views, the apparatus may build a shared resource view wherein resources are scheduled in parallel on a full cluster with a low risk of conflict. In accordance with constructing the resource views, the apparatus may build an isolated resource view wherein there is a risk of conflict between ones of the plurality of user requests from corresponding ones of application schedulers, the corresponding ones of the application schedulers running independently and sequentially on a full cluster. In accordance with constructing the resource views, the apparatus may build an isolated resource sub-view wherein ones of application schedulers schedule resources that run on a partition of a cluster with a low risk of conflict.

In a step or module 1150, the apparatus allocates resources (e.g, in time and space) as indicated in the resource views in accordance with the scheduling units (for instance, by the resource manager 820 of the scheduler 810 of FIGURE 8). The resource views may allocate resources including, without limitation, servers, processors, memory, bandwidth of communication systems, transmit power levels; and intervals of time to operate the resources. The method 1100 may operate within a two-level scheduler wherein the functions associated with the resource manager operates in a first level, the functions associated with the application schedulers operate in a second level, and the functions associated with the arbiter operates in a level accessible by the first level and the second level. The method 1100 ends at a step or module 1160.

Turning now to FIGURE 12, illustrated is a block diagram of an embodiment of a communication node 1200 in a data processing system. The communication node 1200 may be a server configured to perform the functions of a scheduler in a data center. The communication node 1200 includes a processor (or processing circuitry) 1210, a memory 1220 and a communication interface 1230. The communication node 1200 may also include an antenna(s) 1240 depending on the type of device such as a server with wireless communication capability. In particular embodiments, some or all of the functionality described herein may be provided by, without limitation, a machine type communication (“MTC”) and machine-to-machine (“M2M”) devices, a radio base station, a radio network controller, and a data center ( e.g ., computer(s) that form a data center).

The functionality of the communication node 1200 may be provided by the processor 1210 executing instructions stored on a computer-readable medium, such as the memory 1220 shown in FIGURE 12. Alternative embodiments of the communication node 1200 may include additional components (such as the interfaces, devices and circuits mentioned above) beyond those shown in FIGURE 12 that may be responsible for providing certain aspects of the device’s functionality, including any of the functionality to support the solution described herein.

The processor 1210 (or processors), which may be implemented with one or a plurality of processing devices, perform functions associated with its operation including, without limitation, performing the operations of an arbiter, decoding of individual bits forming a communication message, formatting of information and overall control of a respective communication node 1200. Exemplary functions related to management of communication resources include, without limitation, hardware installation, traffic management, performance data analysis, configuration management, security, billing and the like. The processor 1210 may be of any type suitable to the local application environment, and may include one or more of general-purpose computers, special purpose computers, microprocessors, digital signal processors (“DSPs”), field- programmable gate arrays (“FPGAs”), application-specific integrated circuits (“ASICs”), and processors based on a multi-core processor architecture, as non-limiting examples.

The processor 1210 may include, without limitation, application processing circuitry. In some embodiments, the application processing circuitry may be on separate chipsets. In alternative embodiments, part or all of the application processing circuitry may be combined into one chipset, and other application circuitry may be on a separate chipset. In still alternative embodiments, part or all of the application processing circuitry may be on the same chipset, and other application processing circuitry may be on a separate chipset. In yet other alternative embodiments, part or all of the application processing circuitry may be combined in the same chipset. The processor 1210 may be configured to perform any operations described herein. The operations as performed by the processor 1210 may include processing information obtained by the processor by, for example, converting the obtained in formation into other information, comparing the obtained information or converted information to information stored in the respective device, and/or performing one or more operations based on the obtained information or converted information, and, as a result of the processing, making a determination.

The memory 1220 (or memories) may be one or more memories and of any type suitable to the local application environment, and may be implemented using any suitable volatile or nonvolatile data storage technology such as a semiconductor-based memory device, a magnetic memory device and system, an optical memory device and system, fixed memory and removable memory. The programs stored in the memory 1220 may include program instructions or computer program code that, when executed by an associated processor, enable the respective communication node 1200 to perform its intended tasks. Of course, the memory 1220 may form a data buffer for data transmitted to and from the same. Exemplary embodiments of the system, subsystems, and modules as described herein may be implemented, at least in part, by computer software executable by the processor 1210, or by hardware, or by combinations thereof.

The communication interface 1230 modulates information onto a carrier waveform for transmission by the respective communication node 1200 to another communication node. The respective communication interface 1230 also demodulates information received from another server for further processing. The communication interface 1230 can support duplex operation for the respective server 1200, and supports communication with a core network.

The antenna 1240 (antennas), when applicable, may be any type of antenna capable of transmitting and receiving data and/or signals wirelessly. In some

embodiments, the antenna 1240 may include one or more omni-directional, sector or panel antennas operable to transmit/receive radio signals between, for example, 2 gigahertz (“GHz”) and 66 GHz. An omni-directional antenna may be used to

transmit/receive radio signals in any direction, a sector antenna may be used to transmit/receive radio signals from devices within a particular area, and a panel antenna may be a line of sight antenna used to transmit/receive radio signals in a relatively straight line. While the antenna 1240 facilitates wireless communication for the communication node 1200, the communication node 1200 may also communicate via a wired communication path via the communication interface 1230 and, in such instances, the antenna 1240 may not be necessary. The subsystems as introduced above with respect to the preceding FIGURES may be embodied in the communication node 1200 performed by, for instance, the processor 1210 in conjunction with the memory 1220.

Thus, a system and method has been introduced herein to allocate resources by a plurality of application schedulers in a data processing system. The system and method can be performed in real-time, taking into account multiple criteria, to allocate resources within the data processing system.

In one embodiment (including representative reference numbers from the preceding FIGUREs), an apparatus ( e.g ., an arbiter 840 embodied in a server 1200) in a data processing system (800) includes processing circuitry (1210) configured to receive a request for resource views from application schedulers (830) in response to a plurality of user requests (from user equipment 870) to execute applications. The processing circuitry (1210) is also configured to categorize ones of the plurality of user requests into scheduling units to, for instance, reduce the risk of conflict between the application schedulers (830). Each of the scheduling units corresponds to the plurality of user requests that share ones of the resource views.

The processing circuitry (1210) is also configured to provide the scheduling units to a resource manager (820) to enable the resource manager (820) to construct the resource views for the application schedulers (830). In accordance with providing the scheduling units, the processing circuitry (1210) may instruct the resource manager (820) to build a shared resource view wherein ones of the application schedulers (830) schedule resources in parallel on a full cluster with a low risk of conflict. In another embodiment in accordance with providing the scheduling units, the processing circuitry (1210) may instruct the resource manager (820) to build an isolated resource view wherein there is a risk of conflict between ones of the plurality of user requests from corresponding ones of the application schedulers (830), the corresponding ones of the application schedulers (830) running independently and sequentially on a full cluster. In another embodiment in accordance with providing the scheduling units, the processing circuitry (1210) may instruct the resource manager (820) to build an isolated resource sub-view wherein ones of the application schedulers (830) schedule resources that run on a partition of a cluster with a low risk of conflict. Also, the processing circuitry (1210) may instruct the resource manager (820) to allocate resources in time and space to the application schedulers (830) as indicated in the resource views and in accordance with providing the scheduling units.

The resource views may allocate resources including, without limitation, servers, processors, memory, bandwidth of communication systems, transmit power levels; and intervals of time to operate the resources. The apparatus (840, 1200) may operate within a two-level scheduler ( e.g ., a scheduler 810 embodied in a server 1200) wherein the resource manager (820) operates in a first level (825), the application schedulers (830) operate in a second level (835), and the arbiter (840) operates in a level accessible by the first level (825) and the second level (835).

In one embodiment (including representative reference numbers from the preceding FIGUREs), a scheduler (e.g., a scheduler 810 embodied in a server 1200) in a data processing system (800) includes a resource manager (820), application schedulers (830) and an arbiter (840). The application schedulers (830) are configured to receive a plurality of user requests (from user equipment 870) to execute applications. The arbiter (840) is configured to receive a request for resource views from the application schedulers (830) in response to a plurality of user requests, and categorize ones of the plurality of user requests into scheduling units to, for instance, reduce the risk of conflict between the application schedulers (830). Each of the scheduling units corresponds to the plurality of user requests that share ones of the resource views.

The resource manager (820) is configured to receive the scheduling units, and construct the resource views for the application schedulers (830). In accordance with the scheduling units, the resource manager (820) may build a shared resource view wherein ones of the application schedulers (830) schedule resources in parallel on a full cluster with a low risk of conflict. In another embodiment in accordance with the scheduling units, the resource manager (820) may build an isolated resource view wherein there is a risk of conflict between ones of the plurality of user requests from corresponding ones of the application schedulers (830), the corresponding ones of the application schedulers (830) running independently and sequentially on a full cluster. In another embodiment in accordance with the scheduling units, the resource manager (820) may build an isolated resource sub-view wherein ones of the application schedulers (830) schedule resources that run on a partition of a cluster with a low risk of conflict. Also, the resource manager (820) may allocate resources in time and space to the application schedulers (830) as indicated in the resource views and in accordance with the scheduling units. The resource views may allocate resources including, without limitation, servers, processors, memory, bandwidth of communication systems, transmit power levels; and intervals of time to operate the resources. The resource manager (820) may operate in a first level (825), the application schedulers (830) may operate in a second level (835), and the arbiter (840) may operate in a level accessible by the first level (825) and the second level (835).

The foregoing description of embodiments of the present proposed solution has been presented for the purpose of illustration and description. It is not intended to be exhaustive or to limit the proposed solution to the present form disclosed. Alternations, modifications and variations can be made without departing from the spirit and scope of the present proposed solution.

As described above, the exemplary embodiment provides both a method and corresponding apparatus consisting of various modules providing functionality for performing the steps of the method. The modules may be implemented as hardware (embodied in one or more chips including an integrated circuit such as an application specific integrated circuit), or may be implemented as software or firmware for execution by a processor. In particular, in the case of firmware or software, the exemplary embodiment can be provided as a computer program product including a computer readable storage medium embodying computer program code (i.e., software or firmware) thereon for execution by the computer processor. The computer readable storage medium may be non-transitory (e.g., magnetic disks; optical disks; read only memory; flash memory devices; phase-change memory) or transitory (e.g., electrical, optical, acoustical or other forms of propagated signals-such as carrier waves, infrared signals, digital signals, etc.). The coupling of a processor and other components is typically through one or more busses or bridges (also termed bus controllers). The storage device and signals carrying digital traffic respectively represent one or more non-transitory or transitory computer readable storage medium. Thus, the storage device of a given electronic device typically stores code and/or data for execution on the set of one or more processors of that electronic device such as a controller.

Although the embodiments and its advantages have been described in detail, it should be understood that various changes, substitutions, and alterations can be made herein without departing from the spirit and scope thereof as defined by the appended claims. For example, many of the features and functions discussed above can be implemented in software, hardware, or firmware, or a combination thereof. Also, many of the features, functions, and steps of operating the same may be reordered, omitted, added, etc ., and still fall within the broad scope of the various embodiments.

Moreover, the scope of the various embodiments is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed, that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized as well.

Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.