YellowDog Platform

With the YellowDog Platform, you can benefit from all the cloud has to offer, without any limitations, enabling you to manage your workloads across different geographical regions, machine shapes and instance lifecycles.

YellowDog is an established AWS HPC Competency Partner
YellowDog manages the full workload lifecycle, combining a powerful scheduler with an intelligent orchestration engine tuned for provisioning cloud at scale to enable fine grained control and prioritisation of workload scheduling across cloud, hybrid and multi-cloud environments

YellowDog Platform delivers


 

Maximises flexibility via YellowDog’s support for Linux & Windows, container architectures and all xPU types (Intel, AMD, NVIDIA, AWS Trainium/Inferentia etc.) including Graviton/ARM based instances which are more cost and energy efficient

Instance selection can be policy-based (e.g., waterfall) or attribute-based (e.g., based on price powered by YellowDog Insights), automatically selecting the best source of compute to meet customer requirements and preferences. Computing clusters are composable and expand and contract, as required, to respond to the specific needs of your workload

YellowDog is Spot pre-emption aware and able to rapidly move tasks to alternate nodes and re-provision pre-empted instances to minimise any downtime thereby enabling customers to utilise Spot instances with confidence

YellowDog was designed from the ground-up to facilitate massive scale, and through partnership with AWS built one of the world’s largest supercomputers in the cloud, hitting 1m vCPUs within 7mins and reaching 3.2 million vCPUs in just over half an hour as well as demonstrating throughput at 3000 tasks per second. The YellowDog Platform is capable of scaling to 200k nodes and 30m+ vCPUs

 

YellowDog helps customers to optimally locate their data for workload mobility in hybrid and multi-region deployments, and in doing so maximises cost efficiency whilst managing any constraints around data gravity, latency, and confidentiality

YellowDog helps customers become more cost efficient through use of low-price Spot instances, high node utilisation, reduced compute wastage caused by over-provisioning, and lower engineering burden in managing cloud resource x-CSP, as well as minimising data transfer and storage costs through optimising workload mobility

User-friendly web-based interface (‘single pane of glass’) providing a realtime Dashboard for monitoring usage, including an ability to monitor performance down to the node level (CPU and memory utilisation), and tools for managing compute provisioning, object storage, work scheduling, and platform admin

YellowDog’s security by design approach is ISO 27001 Certified and reflects our dedication to ensuring the confidentiality, integrity, and availability of our platform and your data

YellowDog Portal provides visibility of compute usage by user and time period, as well as an ability to monitor and manage access to resources at the user level with hard and soft limits thereby enabling full FinOps control

Comprehensive instance type support

 

Maximises flexibility via YellowDog’s support for Linux & Windows, container architectures and all xPU types (Intel, AMD, NVIDIA, AWS Trainium/Inferentia etc.) including Graviton/ARM based instances which are more cost and energy efficient

Advanced instance selection

Instance selection can be policy-based (e.g., waterfall) or attribute-based (e.g., based on price powered by YellowDog Insights), automatically selecting the best source of compute to meet customer requirements and preferences. Computing clusters are composable and expand and contract, as required, to respond to the specific needs of your workload

Resiliency

YellowDog is Spot pre-emption aware and able to rapidly move tasks to alternate nodes and re-provision pre-empted instances to minimise any downtime thereby enabling customers to utilise Spot instances with confidence

Rapid scale

YellowDog was designed from the ground-up to facilitate massive scale, and through partnership with AWS built one of the world’s largest supercomputers in the cloud, hitting 1m vCPUs within 7mins and reaching 3.2 million vCPUs in just over half an hour as well as demonstrating throughput at 3000 tasks per second. The YellowDog Platform is capable of scaling to 200k nodes and 30m+ vCPUs

Data Anywhere

 

YellowDog helps customers to optimally locate their data for workload mobility in hybrid and multi-region deployments, and in doing so maximises cost efficiency whilst managing any constraints around data gravity, latency, and confidentiality

Cost efficiency

YellowDog helps customers become more cost efficient through use of low-price Spot instances, high node utilisation, reduced compute wastage caused by over-provisioning, and lower engineering burden in managing cloud resource x-CSP, as well as minimising data transfer and storage costs through optimising workload mobility

YellowDog Portal

User-friendly web-based interface (‘single pane of glass’) providing a realtime Dashboard for monitoring usage, including an ability to monitor performance down to the node level (CPU and memory utilisation), and tools for managing compute provisioning, object storage, work scheduling, and platform admin

Security

YellowDog’s security by design approach is ISO 27001 Certified and reflects our dedication to ensuring the confidentiality, integrity, and availability of our platform and your data

FinOps

YellowDog Portal provides visibility of compute usage by user and time period, as well as an ability to monitor and manage access to resources at the user level with hard and soft limits thereby enabling full FinOps control

Trusted by leaders in investment, biotechnology and computing

Integrations

Workflow tools

Integration with popular workflow tools Ray (distributed compute, ML), Apache AirFlow (data engineering, ML), NextFlow (Life Sciences) enables customers to keep their existing workload infrastructures and pipelines whilst taking advantage of YellowDog’s intelligent cloud provisioning

Moreover, the YellowDog Platform is easily configured and managed via a comprehensive set of REST APIs and SDKs (Python, Java and C++) for easy integration into a customer’s DevOps and CI/CD processes

Cloud providers

Fully integrated with all major cloud service providers (CSPs)

Utilises all the latest CSP APIs, such as the AWS EC2 Fleet and Spot Fleet APIs to launch a fleet of thousands of Amazon EC2 instances in a single operation

Data/Storage

Out of the box connectors to object storage services including cloud provider (Azure Blob, Amazon S3, Google Cloud Storage etc.) and 3rd party (VAST, Weka), as well as integration with HPC cluster file systems

Simplifies data access & data transfer across multiple cloud providers, and interworks with YellowDog’s Data Anywhere to facilitate easier workload mobility

3rd party schedulers

Through integrations with all popular schedulers (Slurm, IBM Symphony, LSF, Moab, PBS, Grid Engine, GridServer), YellowDog can act as a unified workload submission platform across both cloud and on-prem resources

Detailed technical information is available in our Dev Portal

Choose YellowDog to accelerate your workloads... so you can make decisions faster