A model that summarizes internal meeting notes is not the same as one that screens job applicants or suggests credit decisions. Treating them the same either bogs down low-risk use in process or under-scrutinizes high-risk use.
A red-light, yellow-light, green-light classification tiers scrutiny: prohibited, cautious (allowed with conditions and human review), and standard (allowed within guardrails). The hard part isn't the three buckets. It's defining what goes in each and building a screening process that routes every new use case to the right one.
What Puts a Use Case in Each Tier
The tiers are defined by impact and data, not by technology. Same model can be red in one context and green in another.
Red light (prohibited): use cases you don't allow without a formal, documented exception, and maybe not even then. Typical criteria: the use case involves putting confidential, proprietary, or regulated data (client data, customer PII, source code, trade secrets, health or financial records) into public or unvetted AI systems. Or the AI is making or materially influencing a decision that affects a person's rights, eligibility, or treatment (hiring, firing, credit, insurance, benefits, lending, discipline, law enforcement). Or the use case creates a high likelihood of harm that you're not willing to accept even with mitigations. Red doesn't mean "we never do this." It means default is no; if there's a business case, it goes through a waiver and full risk assessment, and you may still say no. Examples: client data in a public LLM, unvetted AI screening resumes, AI-generated output used as the sole basis for denying a loan or claim.
Yellow light (cautious): use cases you allow, but only with clear conditions and human review. The AI assists; it doesn't decide alone. Data might be sensitive but not the most sensitive, or the use is internal and contained. Criteria: the use case involves data or decisions that could cause harm if wrong, but human review or other controls can catch errors. Or the tool or vendor hasn't been through your full assessment yet but is in a controlled pilot. Or the use case is novel and you're still learning. Conditions typically include: use only approved tools, only for specified data types and use cases, with human review of outputs before any consequential action, and with logging and monitoring. Examples: AI-assisted research where a human verifies sources and conclusions before publication; draft generation for internal docs with no confidential data, with human edit before use; coding assistance with no proprietary or customer data in the prompt, with code review before merge. Yellow means yes, but we're watching and we expect a human in the loop.
Green light (standard): use cases allowed within your approved-tool and data guardrails, with no extra approval step. Low impact, low sensitivity, or well-understood and already assessed. Criteria: the use case uses only approved tools and stays within the data and use-case limits you've defined. No confidential or regulated data. No decisions that directly affect people's rights or eligibility. Outputs are either not consequential or are routinely verified (e.g., summarization of non-sensitive content, translation, formatting). Examples: document summarization of public or internal non-confidential material with outputs used as a starting point for human work; productivity use (scheduling, formatting, brainstorming) on non-sensitive content; coding assistance on open-source or non-proprietary code within approved tools. Green doesn't mean "no rules." It means follow the policy and the approved list; no extra screening for this use case.
The same activity can move tiers when context changes. "Summarize this" is green for a public blog post, yellow if the doc might contain internal strategy, red if the doc is client confidential.
The Screening Process
When a new use case shows up (someone asks "can we do X?" or you discover a use you didn't know about), you need a consistent way to put it in a tier and then apply the right level of scrutiny.
Capture the use case. Get enough detail to classify. What is the AI doing? What data does it see (type, sensitivity, volume)? Who uses it and for what decision or action? Is there a human in the loop today, and where? What tool or model is in use or proposed? A short intake form or template keeps this consistent. Without it, you'll argue about tier forever.
Apply the criteria. Run the use case against your red criteria first. Does it involve prohibited data in unvetted systems? Does it involve AI-influenced decisions on rights or eligibility without prior assessment? If yes, it's red unless someone requests a waiver. Then check yellow: sensitive data or consequential decisions but with human review and approved or pilot tools? Novel or not-yet-assessed? Yellow. Otherwise, if it fits approved tools and data and is low impact, it's green. Document the tier and the reason in a line or two. That record is what you'll use for audits and for revisiting when the use case or your policy changes.
Route to the right process. Red: no go without a waiver; waiver triggers full risk assessment and sign-off. Yellow: allow only under stated conditions (approved tool, human review, data limits, logging); might require a lightweight risk review or a pilot agreement. Green: no extra approval; user confirms they're within policy and approved list. The screening outcome isn't just "red, yellow, or green." It's "this tier, and here's what happens next." The person who asked gets a clear answer, and the organization applies the right level of oversight.
Revisit when something changes. Use cases drift. New data gets added. The tool gets an upgrade. Regulation changes. Classification isn't one-time. When you learn that a use case has changed (new data type, new decision point, new tool), rerun the screening. When you change your criteria (e.g., you tighten what counts as red), run existing use cases through again. Build a simple trigger: major change to tool, data, or decision flow, or annually for yellow- and red-light use cases.
Keeping the Criteria Honest
The biggest failure mode is everything drifting into green because the criteria are vague or the screening is skipped. "We're just summarizing" can hide client data. "There's always a human in the loop" can mean the human rubber-stamps the AI. Make the criteria testable. For red: "Does this use case put [list your data types] into a system that isn't on our approved list?" "Does the AI output directly determine [list your decision types]?" For yellow: "Is there a human who must review before any consequential action?" "Is the tool and data scope documented?" For green: "Is the tool on the approved list and the use case and data type within the stated scope?" If you can't answer yes or no, the criteria need tightening.
Also make screening mandatory for new use. No "we'll get to it." New use cases (and discovered shadow use) get screened before they're allowed to continue or before they're formally approved. That way red doesn't slip through as "we didn't know," and yellow doesn't become the default dump bucket for "we didn't want to say no."
Red, yellow, green is simple. The work is defining the tiers for your organization, applying them consistently, and routing every new use case through a screening process so the right level of scrutiny always applies.
We design risk-tiered classification and screening processes for AI use cases. Contact us for independent AI risk assessments and governance framework design.