In this video, we sit down to discuss workload management. What it is, what tools are available, and the unique features of the YellowDog Workload Manager.
So, workload management is all about executing a series of jobs or work, on a set of servers. There are workload management tools that can do this, which have the ability to resubmit any work in case of task failures, and also to launch work in selected sequences – e.g. if there are any dependencies on work that has already been completed.
Some of the grids that are launched could have thousands of nodes on them, so it’s really important to have a tool that has a central console for the management of the various workstreams – especially if you are executing multiple workloads simultaneously.
Another thing that you need to consider is monitoring the actual execution itself. In this context, the execution on servers can be broken down into two things:
There are a number of potential server locations, including:
To give an idea of the workload management tools out there, here is a selection:
Alongside these, you also have workload management solutions that come from the cloud providers themselves, for example:
There are also some open-source solutions available – in particular, Slurm is very, very popular.
All of the above are great options. Some of them are more attuned to working on-premise and some are more attuned to working in the cloud.
Ultimately, YellowDog’s Workload Manager is cloud native – it has been built to work in the cloud.
One of the key things, when working in the cloud, is recognising that the environment is completely different. For example, when you execute workloads on-premise, your machines tend to be up and running, you have a very stable network, and the grid is available to you when you start.
Whereas in the cloud, you have a different atmosphere, different capabilities and if you’re using things like Spot or pre-emptible machine types, there’s a possibility you might not get them, or they might be taken away when you’re in the middle of executing a workload.
So, one of the things that the YellowDog Workload Manager does is assume failure and handle this automatically by resubmitting jobs. Alongside this, it also asks for more compute, when it’s required.
The other thing the YellowDog Workload Manager does is work across multiple machine types and regions. So, you can have, for example, workloads being executed in servers across the UK, France and Germany, which would be shown as a single cluster to the YellowDog Workload Manager.
In addition to this, the Workload Manager can also combine resources from multiple cloud providers, and again, show this as one cluster.
Finally, the YellowDog Workload Manager, uses what we would call a ‘pull’ model, where the workers contact the Scheduler for work, rather than the other way around. When the workers come up, they alert the Scheduler saying, “Hey, look I’m here, I’m available for work.” And the Scheduler will then push work to it accordingly.
You are seeing this because you are using a browser that is not supported. The YellowDog website is built using modern technology and standards. We recommend upgrading your browser with one of the following to properly view our website:
Windows MacPlease note that this is not an exhaustive list of browsers. We also do not intend to recommend a particular manufacturer's browser over another's; only to suggest upgrading to a browser version that is compliant with current standards to give you the best and most secure browsing experience.