Guardrails
Hard limits on what an AI agent can do, enforced by the system rather than the prompt: scoped permissions, protected files, required reviews, and kill switches.
What it is
Guardrails are the limits an AI agent physically cannot cross, no matter what it decides to do. Read-only access tokens, protected branches, files it cannot touch, actions that always require a human approval, and a kill switch that stops everything. The defining property: they are enforced by the system, not requested in the prompt.
Why this matters for designers
βI told it not toβ is not a safety strategy. Models drift, prompts get truncated out of context, and agents misread instructions in ways that look reasonable. Guardrails are what let you say yes to agents at all: you can give an agent real work on your design system precisely because the blast radius of its worst mistake is bounded in advance.
How it works in practice
- Scope access mechanically: read-only tokens for observers, branch protection for actors.
- Mark protected zones (token sources, brand assets, release pipelines) that no agent edits directly.
- Gate irreversible actions (merge, publish, delete) behind human approval.
- Keep a kill switch that revokes all agent access in one step, and test that it works.