How to Build an AI Agent Registry — Part 2: Risk Class and Autonomy Level
Today: Agent Registry - The enforcement layer that makes your registry more than a catalogue. What is it? how to build it?
Where We Left Off
Part 1 covered Schema and Ownership: the description and accountability layer of an Agent Registry. Unity Catalog as the agent registry. AWS IAM for per-agent identity. CloudTrail for audit trails. A stack that answers the questions an audit demands: what exists, who owns it, what it can do.
That is necessary, but it is not sufficient.
Schema and Ownership describe an agent. They do not constrain it. An agent with a well-documented schema and a clear owner can still make a consequential decision without human approval, access data it should not touch, or operate at an autonomy level the business never signed off on. That is where Risk Class and Autonomy Level come in and where the registry becomes a governance control rather than a filing system.
Component 3: Risk Class - Not All Agents Carry Equal Risk
Risk classification is the decision that determines what governance an agent receives. If you get it wrong and you either over-govern low-risk agents until teams route around the process, or under-govern high-risk agents until something goes wrong in production.
The four risk categories worth working with in a regulated environment:
Low Risk: bounded output, read-only, no consequential action. An agent that summarises documents or answers questions from a knowledge base.
Medium Risk: supervised action. The agent can write or update records, but a human reviews before execution.
High Risk: consequential action inside a tightly constrained boundary. The agent can execute without per-action approval, but only against approved tools, approved data paths, and fully audited systems.
Critical Risk: autonomous consequential action across multiple systems, with downstream effects that are difficult or impossible to reverse.
In Databricks, risk class is best treated as governed metadata attached to the agent assets you register. For example, on Unity Catalog models, model versions, and functions, all of which support tagging. That makes risk class visible, queryable, and reviewable across the lifecycle. If an agent gains new tools, new data access, or a broader execution boundary, the classification should be reassessed before the next promotion.
So, if you ar eon Databricks, use Unity Catalog to register the agent, its model versions, its functions, and its metadata. Use tags to record risk class. Use Unity Catalog privileges, managed authentication, and deployment-time permission checks to enforce which tools and data paths are actually reachable at runtime. If you need an explicit risk-policy engine that says, for example, a High Risk agent may call Tool A but never Tool B, that policy layer still sits outside Databricks - in an API gateway, middleware layer, or external authoriser.
That is the line between registry and policy engine. Databricks gives you the governed assets, the permissions model, and the audit surface. If you need deterministic policy decisions over agent behaviour itself, you compose that on top.
On the AWS-native path: Amazon Bedrock AgentCore Gateway with Cedar policies provides a similar enforcement pattern - deterministic allow/deny decisions on every tool call, with Lambda interceptors for dynamic validation. Cedar policies are authored in a declarative language and evaluated against principal, action, and resource with optional conditions over request context. Worth monitoring as it moves towards general availability.
Component 4: Autonomy Level - How Independently an Agent Operates
Risk class tells you what an agent is allowed to do. Autonomy level tells you how much independent judgement it is permitted to exercise in doing it. These are related but distinct. A High Risk agent can still operate at L1 with human approval, or at L3 with bounded multi-step execution. The risk class constrains the action space. The autonomy level constrains the operating model.
The five levels from the registry:
L0 - Assistive only. Generates output for human review. No execution.
L1 - Human-approved actions. The agent proposes; a human confirms before execution.
L2 - Semi-autonomous workflows. Executes within defined boundaries without per-action approval.
L3 - Goal-driven execution. Plans and executes multi-step tasks. Human oversight at checkpoints, not per action.
L4 - Multi-agent autonomy. Coordinates with other agents, spawns sub-agents, orchestrates complex workflows with minimal human intervention.
In Databricks, autonomy level should be treated as an explicit registry attribute you manage in your own governance model, not as a built-in Unity Catalog field with native enforcement semantics. The platform gives you the places to record it, such as model and model-version metadata, but the meaning of L0 through L4 remains an operating policy you define and then enforce through workflow design, approvals, deployment controls, and runtime boundaries.
MLflow Tracing captures the full execution path - every tool call, every decision point, every input and output - linked to the agent version and autonomy level at the time of execution. In a regulatory investigation, that trace is the evidence.
So, autonomy level is declared in the registry, promotion is gated through MLflow deployment workflows, runtime execution is bounded by Databricks permissions and isolated tool execution, and trace evidence is captured through MLflow. The registry records the autonomy decision; the surrounding control plane makes it real.
Open source alternative: Microsoft released the Agent Governance Toolkit in April 2026 under MIT licence, an open source project that addresses runtime security governance for autonomous agents. It includes execution rings inspired by CPU privilege levels, kill switches for emergency agent termination, and circuit breakers. It maps directly to the OWASP Top 10 for Agentic Applications (2026) and is the most purpose-built open source option currently available for autonomy-level enforcement.
What the Registry Now Does
Across both parts, the four components work as a single control system.
Schema tells the registry what the agent is. Ownership tells it who is accountable. Risk Class tells it what level of control the agent requires, and that classification is carried as governed metadata across the agent’s registered assets. Autonomy Level tells it how much independent judgement the agent is permitted to exercise, and that decision is enforced through release workflows, runtime permissions, and execution boundaries rather than left implicit in code.
The audit request scenario from Part 1 - fourteen agents, nobody can answer, is no longer possible. Every agent has a schema record, an owner, a declared risk class, an explicit autonomy level, governed access to tools and data, and an execution trail that can be reconstructed through MLflow and platform audit logs. The registry is not a reporting artefact. It is the control plane that makes autonomous operation in a regulated environment defensible.
P.S. If you’re new here - welcome 🎉. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it’s always: what can you use Monday morning.
Talk soon,
Sandi.
Ask your friends to join.
More valuable content coming your way.
Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.



