Back to Blog
Incident ResponseAI GovernanceRisk ManagementPlaybook

Your AI Policy Needs an Incident Response Plan: What Happens When a Model Fails in Production

Stay Updated on AI Risk & Compliance

Get notified when we publish new insights on AI risk assessment, regulatory compliance, and security testing.

Most organizations have an incident response plan for data breaches. Few have one for when a model hallucinates and wrong information reaches customers, when a regulator flags biased decisions, or when prompt injection compromises a customer-facing agent. Security runs point because it's "an incident," but the playbook was written for stolen credentials and leaked databases, not for "the model did something we didn't intend." AI failures need their own response path: clear roles, escalation triggers, containment procedures, and post-incident review. Here's how to build an AI incident response playbook that fits alongside your existing IR and actually gets used.

What Counts as an AI Incident?

An AI incident is an event where an AI system causes or could cause harm, violates policy or law, or materially fails to perform as designed in a way that affects users, decisions, or data. That covers a lot. Hallucinated or factually wrong outputs that reach customers or feed downstream decisions. Biased or discriminatory outputs that affect individuals or draw regulatory attention. Prompt injection or other attacks that cause the system to leak data, bypass safeguards, or behave maliciously. Model or data drift that degrades performance and leads to wrong decisions. Unauthorized or unintended use (e.g., someone using a high-risk model for a use case it wasn't approved for). Data exposure through the AI (inputs or outputs logged, trained on, or exfiltrated by a vendor). Failures of human oversight (e.g., a human-in-the-loop step that was skipped or automated away). If it's serious enough that you'd want to contain it, investigate it, and possibly report it, treat it as an AI incident and run it through the playbook.

Not every bug is an incident. A typo in a non-critical output might be a ticket. A wrong recommendation that affected one user and was caught and corrected might be a low-severity incident. Define severity (e.g., single user vs. population, low impact vs. regulatory or reputational risk) so the team knows when to escalate. The playbook should say what an AI incident is and give examples at each severity level so people don't ignore serious issues or flood the process with noise.

Clear Roles

Someone has to own the response. AI incidents often span engineering, product, security, legal, compliance, and communications. Without assigned roles, everyone assumes someone else is handling it.

Incident lead. One person runs the response until the incident is contained and handed off to post-incident review. They coordinate containment, communication, and decisions. For AI incidents this is often the owner of the affected system or a designated AI governance lead, with security in the loop when there's a security angle (prompt injection, data exposure). The lead doesn't have to do everything. They have to ensure everything gets done and that decisions are made and recorded.

Technical lead. Someone who can assess what the system did, what failed, and what needs to change. They support containment (e.g., turning off a feature, rolling back a model, tightening inputs) and provide the technical narrative for the incident report. Usually the engineering or ML owner for the system.

Legal and compliance. Involve them early when the incident might trigger notification obligations, regulatory scrutiny, or liability. They help decide what to document, what to say externally, and whether to report. For bias or discrimination issues they're essential. For prompt injection or data exposure they'll want to align with your breach IR if applicable.

Communications. For customer-facing or reputational impact, someone needs to own internal and external messaging. That might be comms, product, or the incident lead depending on size. The playbook should say who is authorized to speak and when to escalate to leadership.

Escalation path. Define when the incident lead escalates to leadership (e.g., severity threshold, regulatory impact, public disclosure). List names or roles (e.g., CISO, general counsel, head of product). When it's 2 a.m. and the lead needs a call on whether to take the system down, they need to know who to call.

Document these roles in the playbook and keep the list updated. Run a tabletop so the people in those roles have done it once before the real incident.

Escalation Triggers

The playbook needs to say when to invoke it. Triggers should be specific enough that people don't hesitate ("is this really an incident?") and broad enough that serious issues aren't missed.

Severity-based triggers. Define levels. Example: Severity 1, immediate response: harm or high likelihood of harm to individuals (wrong decision affecting rights, safety, or eligibility); active exploitation (prompt injection, data exfiltration); output reaching a large population that is wrong, biased, or otherwise harmful. Severity 2, same-day response: wrong or biased output affecting a limited set of users; performance degradation that could affect decisions; regulator or auditor inquiry about the system. Severity 3, structured response within a few days: single-user impact corrected quickly; near-miss (e.g., caught before rollout). When in doubt, escalate. The playbook can say "if you're unsure, treat as Severity 2 and let the incident lead downgrade."

Source-based triggers. Who can declare an incident? Any engineer or product owner who sees a failure that meets the definition. Security when they detect attack or abuse. Legal or compliance when they receive a regulator request or a threat of action. Customer support when they see a pattern of user-reported errors or harm. The playbook should say "if you see X, open an incident and notify [incident lead or channel]." Make the channel visible (e.g., Slack, ticket queue, on-call).

Don't wait for proof. If there's a credible report of bias, prompt injection, or major hallucination, start the process. You can confirm severity and adjust. Waiting until you've finished the investigation to "declare" an incident is how containment gets delayed and evidence gets lost.

Containment Procedures

Containment is "stop the harm or reduce the risk while we figure out what happened." It's not the same as root-cause analysis. Do containment first.

Immediate options. Depending on the system and the failure: disable the feature or the model for affected users or globally; roll back to a previous model or config; restrict inputs (e.g., block the attack vector); add or tighten human review for outputs before they reach users; revoke or restrict access to the system. The playbook should list these options and who can execute them (e.g., on-call engineer can disable feature; rollback might need lead approval). For high-risk systems, document the "kill switch" or rollback procedure in advance so it's not designed under pressure.

Preserve evidence. Before changing too much: log the state of the system, the inputs and outputs involved, and who did what when. Take screenshots or exports if relevant. You'll need this for post-incident review and possibly for legal or regulatory response. The playbook should say "preserve X before making changes" and who is responsible.

Communicate internally. Notify the people who need to know: incident lead, technical lead, legal/compliance if triggered, leadership if severity warrants. Use a single channel or thread so updates are in one place. Avoid speculation in writing until you have a shared picture.

Customer and external communication. If users or the public are affected, follow your comms plan. Acknowledge the issue, say what you're doing, and commit to a timeline for more information. Don't promise root cause before you have it. Legal and comms should align on wording.

Containment ends when the immediate risk is reduced and the situation is stable enough to move to investigation and post-incident review.

Post-Incident Review

After containment, run a structured review. Not a blame session. A learning session.

What happened. Timeline of the incident: when it started, when it was detected, when it was contained. What the system did wrong (hallucination, bias, compromise, drift). What the impact was (who was affected, how, what data or decisions were involved).

Why it happened. Root cause. Was it a model bug, bad data, an attack, a process failure (e.g., human review skipped), or a gap in design? Be specific. "The model wasn't tested for this input type" or "we didn't have input validation for prompt injection" is more useful than "the AI failed."

What we're doing about it. Actions with owners and deadlines. Fix the bug, add the test, tighten the process, update the model, add monitoring. Some actions might be "update the AIA" or "re-run risk assessment." Track these like any other remediation plan.

What we're changing in the playbook. Did the response work? Was anything missing (roles, triggers, containment steps)? Update the playbook so the next incident runs smoother. Document lessons in a short report and store it with your incident log.

Regulatory and legal follow-up. If there are notification or reporting obligations, legal and compliance own those. If a regulator is involved, document what you're providing and when. Post-incident review should capture what was reported and to whom.

Your AI policy sets the rules. The incident response plan is what happens when the rules weren't enough or something went wrong anyway. Define the incidents, assign the roles, set the triggers, document containment, and close the loop with post-incident review. When a model fails in production you'll have a playbook instead of a scramble.


Need help building or stress-testing your AI incident response? We do independent AI risk assessments and governance program design. Get in touch.

Ready to Get Started?

Get an independent
AI risk assessment

Our team of offensive security engineers can assess your AI systems for vulnerabilities, bias, and regulatory compliance gaps. Evidence-backed findings, not compliance theater.

Request a Review