What We Learned Testing Frontier AI Security Models Against Our Own Code

Broadcom Infrastructure Software Findings — And What Customers Should Do Next

The Broadcom Infrastructure Software Group has spent the past several weeks testing the latest generation of frontier AI models against some of our own production code. We want to share what we learned because the implications matter for every organization that depends on software.

Our ultimate findings were jolting, but the team’s initial impressions could be best described as “impressive but not groundbreaking.” We provided the models with source code and asked them to find vulnerabilities. Most of what they found on a first pass was not exploitable in a production context: real findings, but without the kind of operational grounding that distinguishes a defect from an exploitable vulnerability. If we had stopped there, we might have concluded the hype had outrun the reality. Recent third-party benchmarking shows progress against capture-the-flag scenarios is linear rather than exponential — consistent with our initial read.

The Reframe

We did not stop there. What we learned is that the useful mental paradigm for this technology is not to treat it like a vulnerability “scanner.” The models are non-deterministic; they do not work like static analyzers. They work like a security engineer you direct and support — or like a threat actor, if you are on the receiving end. Skilled human researchers who hunt zero-days use a range of tools, reason carefully about their target, and iterate on hypotheses. The models should be given the same working environment.

Our change in methodology changed the results substantially. We gave the models threat models for each component we tested. We gave the models access to both source code and a live environment so they could validate their own findings. We required proof-of-concept exploit code for every claimed vulnerability, and a proposed fix. We told the models, explicitly, that we did not believe the findings on face value and that we would reject anything that couldn’t be proven. That combination — context, tooling, direction, adversarial validation — is the discipline that turns frontier AI from an impressive demo into a functional part of a security program.

The Findings

With that methodology, the results were not subtle. We are making new discoveries about our code at an increased order of magnitude. More importantly, we are learning things that appear unlikely to ever have been uncovered by human researchers alone — the kind of issues that only become visible when something can reason combinatorially across a large codebase at machine speed.

The capability that most changes the defensive picture is chaining. Some models are effective not only at finding individual bugs but at combining two or three lower-severity issues into a single higher-severity exploit path. A vulnerability that a human team might have triaged as “low” becomes consequential in combination with two others. This is the capability that breaks triage-based vulnerability management as a discipline.

The Shift — from Triage to Velocity

Two consequences emerge from these capabilities. The volume of known vulnerabilities across the industry will rise substantially over the next couple of years. The interval between disclosure and exploitation will continue to compress. Attackers with access to these capabilities will move from AI-assisted operations to AI-driven ones, developing exploits nearly in real time rather than over weeks.

In that environment, the defender’s advantage comes from patching velocity, not triage precision. Prioritization as a strategy begins to fail: by the time a defender has ranked an issue as low priority, an AI-assisted attacker may already have chained it with two others into a working exploit. The security fundamentals still apply — patch hygiene, least-privilege access, defense in depth, correlated telemetry, insider threat controls — but the doctrine built on top of them has to change.

Three Phases Ahead

We expect the impact of these capabilities to unfold in three phases.

Phase one is underway now. Critical software providers and major open source projects are producing an initial wave of fixes. This phase is orderly because the organizations involved have the engineering capacity to absorb it. Broadcom has effective processes in place and customers should pay extra attention to update notifications and security advisories.

Phase two will follow over the next six to 18 months. Broader waves will reach widely deployed software with thinner maintainer resources, including long-tail open source. This phase will be more operationally disruptive: downstream dependencies are less understood, and the affected organizations often lack the engineering depth to respond at the required speed. Broadcom is committed to maintaining strong vulnerability management programs for our software supply chains.

Phase three is the steady state defenders should already be planning for. The volume of known vulnerabilities will stabilize at a significantly higher level. The interval between discovery and exploitation will be short. The defender’s advantage comes from patching velocity, not triage precision.

What Technology Teams Should Do — Especially Now

The initial wave of AI-discovered vulnerabilities will be addressed primarily through vendor patches. Every major software vendor, Broadcom included, will have new capabilities to find and fix issues at a greater scale and speed. For most enterprises, the practical implications are direct:

Move to the latest versions of your vendor software. The further you are from current, the more publicly known issues you are carrying. This is true for every vendor in your stack.
Automate your patching pipeline. Manual testing and staged approval cycles designed for quarterly releases are not compatible with the cadence the new environment requires. Emergency change pathways for critical security patches should be established as standard operating procedure, not exception.
Measure mean-time-to-remediation and treat it as a core operational metric. Days-long remediation windows will not be sufficient. Set explicit targets, measure against them, report to senior leadership.
Invest in the discipline required to use AI on the defender’s side. Context, tooling, direction, and adversarial validation are not optional additions to model deployment — they are what makes the difference between noise and useful output. Most organizations will not build this discipline from scratch; platform vendors, infrastructure providers, and open collaboration exist to do this work on their behalf.

This is the standard that the industry needs to adopt. The first defense most organizations will have against the AI-accelerated vulnerability deluge is the speed at which they can apply updates and mitigations.

What Broadcom’s Infrastructure Software Group is Doing

Internally, we are integrating frontier AI models into our security engineering work end to end: vulnerability discovery, exploit validation, patch generation, and regression testing. The discipline required to get value from these models — context, tooling, direction, validation — is being embedded in how our engineering teams operate.

Broadcom’s Infrastructure Software Group operates on the view that the defender community at large — the organizations that depend on software, regardless of their scale or internal resources — must be able to take advantage of frontier AI models for security. That framing determines how we invest in our capabilities, partnerships, and contributions to open standards. Our responsibility is to make sure that the defender community can reach the same security outcomes through the vendors and partners they already trust.

The Bottom Line

Staying current is the baseline. The organizations that do this well — that consume vendor updates rapidly, that automate their remediation pipelines, that measure and reduce mean-time-to-remediation — will be the ones that come through this transition with their operations intact.

Broadcom Infrastructure Software: FAQ Adapting to AI-Accelerated Vulnerability Discovery