While the term Private AI has been around for more than seven years, it was often narrowly defined based on niche use cases. We saw Private AI’s opportunity and market reach as far more impactful, and have worked throughout the past year to add clarity, context, and innovation to what has now become a high-growth market segment.
When we first shared our thoughts on Private AI at last year’s VMware Explore conference, we said it marked a new way to bring the AI model to customer data. We spoke about Private AI not as a product, but as a powerful architectural approach that could provide customers with the benefits of AI without having to compromise control of data, privacy, and compliance.
Why has it resonated?
A year ago, customers were being told that AI was something beyond their reach because they would need hundreds to thousands of GPUs to get started. And since they couldn’t source the necessary processing power, their only option was to run all their services with the public cloud providers. Looking back, fine-tuning the Hugging Face StarCoder model on a single NVIDIA A100 GPU was our first “ah-ha” moment. The starting costs for AI turned out to be far less than we thought, and when we started moving services to production, we found that we had a far lower cost running AI inferencing services in our data centers. This, in turn, had a direct impact on our AI product strategy and roadmap.
Private AI: One Year Later
In the intervening year, we’ve witnessed wide acceptance of our approach. Private AI is now covered as an industry market category by leading industry analyst firms, there are commercial Private AI solutions in the market, and customers are frequently asking for conversations about Private AI.
In conversations with the heads of AI at nearly 200 end-user organizations, it has become clear to me that organizations will leverage both public clouds and private data centers (owned or leased capacity) to meet their needs. SaaS AI services have proven their value in a range of use cases, including marketing content and demand generation; however, there are also many use cases where privacy, control, or compliance require a different approach. We have seen customers starting AI applications in a public cloud and deploying them to a private data center for several reasons:
- Cost - Customers with mature AI environments have shared with me that their cost savings for Private AI is 3 to 5 times that of comparable public cloud AI services. When they use open source models and manage their own AI infrastructure, they can also have a predictable cost model as opposed to the token-based billing that they have grown used to with public AI services, which can lead to unpredictable costs from month-to-month.
- Privacy and Control - Organizations want to maintain physical control of their data and run AI models adjacent to their existing data sources. They don’t want to assume any risk of data leakage, whether it’s real or perceived.
- Flexibility - The AI space is moving so fast that it is not pragmatic to bet on a single vertical stack for all of your AI needs. Instead, a platform that allows you to share a common pool of AI infrastructure gives you the flexibility to add new AI services, A/B test and swap out AI models as the market evolves.
The Latest from VMware and NVIDIA
Launched with much fanfare at Explore 2023, VMware Private AI Foundation with NVIDIA became generally available this past May. Since then, we have seen tremendous demand for the platform in all major industry verticals, including in the public sector. At this year’s show, we announced that we are adding new capabilities today, while also showcasing what’s to come when we make VMware Cloud Foundation 9 available in the future.
Today, we introduced a new model store that will enable ML Ops teams and data scientists to curate and provide more secure LLMs with integrated role-based access control to help ensure governance and security for the environment, and privacy of enterprise data and IP. This new feature is based on the open source Harbor container registry, allowing models to be stored and managed as OCI-compliant containers, and includes native NVIDIA NGC and Hugging Face integrations (including Hugging Face CLI support), offering a simple experience for data scientists and application developers. Additionally, we’re adding guided deployment to automate workload domain creation workflow and other infrastructure components of VMware Private AI Foundation with NVIDIA. This will accelerate deployment speed and a further reduction in administrative tasks resulting in a faster time to value.
Further, several exciting capabilities planned for VCF 9 showcased at Explore will include:
- Data Indexing and Retrieval Service - Chunk, index and vectorize data and make available through updatable knowledge bases with a configurable refresh policy that ensures model output remains current.
- AI Agent Builder Service - Use natural language to quickly build AI agents such as chatbots to realize quick time-to-value for new AI applications.
- vGPU profile visibility - Centrally view and manage vGPU profiles across your clusters, providing a holistic view of utilization and available capacity.
- GPU Reservations - Reserve capacity in order to accommodate larger vGPU profiles, ensuring that smaller vGPU workloads do not monopolize capacity and not leave sufficient headroom for larger workloads.
- GPU HA via preemptible VMs - Through the use of VM classes, you will be able to utilize 100% of your GPU capacity and then snapshot and gracefully shut down non mission critical VMs (e.g., prioritize production over research) when capacity is needed.
Why Us?
Organizations have chosen to move forward with VMware, a part of Broadcom, as their strategic AI partner for many benefits:
- Lower TCO - AI applications are complex, and require considerable intelligence at the infrastructure layer to meet performance and availability requirements. This has to start with getting your infrastructure simplified and standardized. It is why organizations are building their AI infrastructures on VMware Cloud Foundation, which has shown dramatically lower TCO than alternatives. As mentioned before, running AI services on a virtualized and shared infrastructure platform can also lead to far lower and more predictable costs than comparable public AI services. When you virtualize and share capacity among data scientists and AI applications, organizations gain all the economic benefits themselves versus when they consume public AI services, where the provider’s ability to virtualize and share capacity goes to their profit margin. Best of all, you can virtualize infrastructure for AI without sacrificing performance and in some cases seeing better performance than bare metal.
- Resource sharing - Resource scheduling is one of the most complex aspects of AI operations, and the VMware Distributed Resource Scheduler (DRS) has continued to evolve for nearly 20 years. Our technology lead in this space allows organizations to virtualize and intelligently share GPUs, networks, memory, and compute capacity, driving automated provisioning and load balancing. Our innovation leadership is a key reason why organizations that have tried operating their own homegrown AI platforms have turned to VMware Private AI Foundation with NVIDIA.
- Automation - Our ability to safely automate the delivery of AI app stacks within minutes and continue to drive automation beyond Day 2 has also been a key factor fueling excitement and adoption. This can range from building new AI workstations to bringing NVIDIA Inference Microservices (NIMs) to production.
- Centralized Ops - Centralized operations have been shared as another key benefit we provide. Organizations are able to use the same set of tools and processes for both AI and non-AI services, which further reduces their TCO for AI applications. This also includes centralized monitoring of your GPU estate.
- Trust - Organizations have depended on VMware technologies to run some of their most critical applications over many years. They are excited about our Private AI roadmap, and trust us to deliver.
Private AI: It’s All About the Ecosystem
Time has also shown us that there will not be a singular solution for AI. This is truly an ecosystem game, and we are continuing to push forward to build the best possible ecosystem for VMware Private AI with partners of all sizes. Today at Explore we announced new or expanded efforts with the following partners:
- Intel: We announced that VMware Private AI for Intel will support Intel Gaudi 2 AI Accelerators, providing more choice and unlocking more use cases for customers with high performance acceleration for GenAI and LLMs.
- Codeium: Accelerates time to delivery with a powerful AI Coding assistant that helps developers with code generation, debugging, testing, modernization, and more.
- Tabnine: Provides a robust AI code assistant that streamlines code generation and automates mundane tasks, allowing developers to spend more time on value-added work.
- WWT: WWT is a leading technology solution provider and Broadcom partner for full stack AI solutions. To date, WWT has developed and supported AI applications for more than 75 organizations and works with us to empower clients to quickly realize value from Private AI, from deploying and operating infrastructure to AI applications and other services.
- HCLTech: Provides a Private Gen AI offering designed to help enterprises accelerate their Gen AI journey through a structured approach. Paired with a customized pricing model and HCLTech's data and AI services, this turnkey solution enables customers to move from Gen AI POC to production more quickly, with a clearly defined TCO.
Looking Ahead
It’s clear that AI is going to become even more mainstream in the coming years as organizations tap its power to help humans become more productive and innovative. But that also puts the onus on companies to ensure their infrastructures are sufficiently robust to handle this accelerating transition.
A year ago, we made the case that the AI space was moving so rapidly that customers shouldn't bet on a single solution. They would be better prepared for the future if they invested in a platform that would give them enough flexibility to meet new moments. When requirements changed or a better AI model came along, we argued, this platform approach would facilitate internal adoption. It was also clear to us that there was growing demand to run AI models adjacent to anywhere organizations have data, and that privacy, control, and a lower TCO would drive architecture and purchasing decisions.
A year later, I’m even more convinced that we are on the right path. Best of all, there’s so much more to come.