CoreWeave offers cloud-based Grace Blackwell GPUs for AI training

Cloud services provider CoreWeave has announced it is offering Nvidia’s GB200 NVL72 systems, otherwise known as “Grace Blackwell,” to customers looking to do intensive AI training.

CoreWeave said its portfolio of cloud services are optimized for the GB200 NVL72, including CoreWeave’s Kubernetes Service, Slurm on Kubernetes (SUNK), Mission Control, and other services. CoreWeave’s Blackwell instances scale to up to 110,000 Blackwell GPUs with Nvidia Quantum-2 InfiniBand networking.

The GB200 NVL72 system is a massive and powerful system with 36 Grace CPUs and 72 Blackwell GPUs wired together to appear to the system as a single, massive processor. It is used for advanced large language model programming and training.

CoreWeave is not the first CSP to offer Grace Blackwell. Other CSPs offering it include Cohere, IBM, and Mistral AI, are all using the hardware for model training and deployment. Cohere is using the Nvidia hardware to help develop secure enterprise AI applications on its enterprise AI platform, North.

IBM uses CoreWeave services to train its Granite open-source AI models used for IBM watsonx Orchestrate to build and deploy AI agents, while Mistral AI plans to build the next generation of open-source AI models Blackwell.

Cohere said it was seeing three times the performance and training LLMs over the previous Hopper generation of GPUs, and Mistral said that out of the box and without any further optimizations, it saw a twofold improvement in performance for dense model training.

Grace Blackwell instances are available now to CoreWeave customers.

Total
0
Shares
Previous Post

Kyndryl launches private cloud services for enterprise AI deployments

Next Post

VMware (quietly) brings back its free ESXi hypervisor