Log4j, SolarWinds, dependency confusion. Software supply chain security took years to become a board-level concern. "What's in our build" became a compliance and risk question. The same question is now landing on AI: what's in the model, who produced the data it was trained on, and how do we know the weights we're running are the weights we think they are? The DoD has made that question explicit. Section 1512 of the National Defense Authorization Act extends SBOM-style transparency to AI systems: algorithms, models, datasets, and their provenance. By August 2026 the Department must report on implementation. For vendors and integrators, that's a deadline. For everyone else, it's a signal. The assumption that you can treat a model or a dataset as a black box—download, fine-tune, deploy—is being retired. The pipeline from training data to model weight to inference is a supply chain. It has the same kinds of failure modes as any other: poisoned inputs, compromised artifacts, and opaque provenance.
Basilisk Venom: Poison in the pipeline
In early 2025 a concrete attack illustrated what had long been theorized. Malicious actors planted jailbreak-style instructions inside public GitHub repositories. Those repos were later used in the data mix for fine-tuning DeepSeek's DeepThink (R1) models. The poison was inert until the model was trained. Then it wasn't. Models trained on the contaminated data began to exhibit the embedded behaviors—not because an attacker was live-in-the-middle at inference time, but because the training data had been weaponized. The project was dubbed Basilisk Venom. It's a useful name: the threat is in the pipeline, not only at the API.
One vendor had a bad data source. The lesson: any pipeline that pulls from the open web, from public repos, or from third-party data providers is vulnerable to the same pattern. Adversaries don't need to compromise your model server. They need to get malicious or misleading content into a dataset that will eventually be used for training or fine-tuning. That content can be subtle—a few poisoned examples in a large corpus—or brazen. Research has shown that poisoning a small fraction of training traces (on the order of a few percent) can embed backdoors that trigger on specific phrases and cause models to leak confidential information or follow unsafe instructions at high success rates. The supply chain problem isn't abstract. It's "what data did we use, where did it come from, and can we trust it?"
From SBOM to AI-BOM
The NDAA language is clear: policies and regulations that apply to software bills of materials shall also apply to "artificial intelligence systems, models, and software used, developed, or procured by the Department." That means tracking models, datasets, libraries, and dependencies; documenting provenance; and updating procurement to enforce that transparency. The Secretary of Defense must issue a DoD-wide AI cybersecurity and governance policy within 180 days, covering the full lifecycle from training through deployment and retirement. The goal is the same as with traditional SBOMs: know what you're running, where it came from, and what's in it.
An AI-BOM, then, isn't a single document. It's a discipline. For every model you deploy or procure, you need to be able to answer: what base model, what training or fine-tuning data (at least at the source or corpus level), what frameworks and libraries were used to train or serve it, and what checks—if any—were applied to the data and the resulting weights. That doesn't require publishing proprietary data. It does require internal traceability and, for DoD and other regulated contexts, the ability to report it. Vendors that sell models or AI-powered systems will increasingly be asked for AI-BOM-style documentation. Getting ahead of that—building provenance tracking into your own ML pipelines and data pipelines—reduces last-minute scramble and positions you for contracts that demand it.
Backdoored fine-tuning and compromised weights
Training data is one attack surface. The model artifact itself is another. Pretrained weights can be tampered with before they're ever fine-tuned. Work on "privacy backdoors" showed that corrupted pretrained models can be used to extract fine-tuning data with high success. So the threat isn't only "we trained on bad data." It's also "we started from weights that were already compromised." Similarly, fine-tuning adapters—LoRA, QLoRA, and the like—can be backdoored. An attacker who contributes a poisoned adapter or who compromises the pipeline that produces adapters can inject behaviors that trigger under specific inputs. The result is a model that looks normal on standard benchmarks but misbehaves when an adversary uses the right trigger. That's a supply chain attack on the model artifact.
Defense here overlaps with classic software supply chain: verify provenance, integrity, and—where possible—reproducibility. Integrity means checking that the weights or adapters you're using match what the provider claims (e.g., hashes, signatures). Provenance means knowing who produced them and from what base and data. Reproducibility is harder for large models but matters for high-assurance contexts: can you reproduce the training run and get the same weights? Many teams will rely on signed artifacts from trusted providers and on internal controls that prevent unsigned or unvetted models from reaching production. The point is to treat model weights and adapters as deployables that need the same kind of control as any other binary.
Verification when you don't control the pipeline
A trickier case is fine-tuning-as-a-service. You send your data to a provider; they fine-tune a model and return it. You never see the training run. How do you know they actually used your data and didn't just return a lightly tweaked base model? Work like vTune addresses exactly that. By injecting a small number of backdoor data points into the dataset you send for fine-tuning, you can later test whether the returned model responds to the backdoor trigger. If it does, you have statistical evidence that your data influenced the model. The method requires only a few thousand extra tokens and a handful of inference calls to verify, with minimal impact on normal task performance. It's a way to get assurance when you don't control the pipeline—a verification layer on top of a supply chain you can't fully see.
That pattern—injecting verifiable signals into the supply chain—is a useful mental model. You may not be able to audit every data source or every training run. You can design checks that detect whether your assumptions (e.g., "our data was used," "this model wasn't swapped") hold. AI-BOM and provenance tracking tell you what's supposed to be there. Verification and integrity checks tell you whether what you have matches.
Applying supply chain thinking to ML
Treat the ML pipeline like a build pipeline. Know your sources: base models, datasets, and libraries. Prefer signed, versioned artifacts and documented provenance. Segment and minimize: don't pull from unvetted or overly broad data sources if you're building something that will handle sensitive or safety-critical tasks. Validate and test: run your models against adversarial and backdoor probes; monitor for drift and unexpected behavior in production. And plan for disclosure and response: when a critical vulnerability is found in a model or a dataset (e.g., a widely used training corpus is found to be poisoned), have a process to assess impact, update or replace artifacts, and notify stakeholders.
The DoD's NDAA move is a forcing function. It won't be the only one. Regulators and enterprises are starting to ask the same questions: what's in our AI stack, and can we trust it? The answers require the same kind of discipline that software supply chain has slowly adopted—provenance, integrity, verification, and response. From training data to model weights, the pipeline is the supply chain. Securing it is no longer optional.
Building AI-BOM practices or assessing AI supply chain risk? We do AI governance and security reviews. Get in touch.