Most security thinking still treats AI as a feature inside an application. That framing is already obsolete. An autonomous agent is not a feature — it is an actor. It holds credentials, makes decisions, calls tools, and triggers effects in the world. Once you see agents as a new class of privileged, non-human actor, the security questions change: not "is this output safe to display?" but "what is this actor allowed to do, with whose authority, and how do we contain it when it misbehaves?"
Meta3Agents was designed around that reframing. Security is not a layer applied on top of the agents; it is the boundary the agents run inside. Here is how the pieces fit.
You cannot secure what you cannot identify. Each agent operates with an identity, access is governed by role-based access control, and credentials are handled as hashed tokens rather than passed around in the clear. Critically, agent-to-agent calls are signed — when one agent invokes another, the call carries verifiable provenance. This closes a gap specific to multi-agent systems: in a network of cooperating agents, an unsigned internal call is an unauthenticated instruction, and unauthenticated instructions are how lateral movement happens.
An agent should be able to do exactly what its job requires and nothing more. On the platform, capability is bounded by skill allowlists: an agent can invoke only the skills it has been explicitly permitted, not the full surface of everything the system can do. This is least privilege applied to a new kind of subject. It means a compromised or misbehaving agent is confined to its allowlisted skills — the blast radius is bounded by design rather than by hoping nothing goes wrong.
When agents execute code or tools, they do so in sandboxed, non-root containers. The principle is blast-radius limitation: assume any given execution could go wrong, and ensure that when it does, the damage is contained to a disposable, unprivileged environment rather than spreading into the host or adjacent systems. Non-root execution in particular removes an entire category of escalation risk. Containment is not a response to a specific known threat; it is the default posture toward all of them.
Security is not only prevention — it is also detection and forensics, and that is where the audit trail does double duty. Every decision lands in a hash-chained, replayable log with structured traces of the path through the system. Hash-chaining makes the record tamper-evident, which matters enormously after an incident: the first thing you need is a trustworthy account of what actually happened, and a log an attacker could quietly rewrite provides no such thing. Watchdog supervision monitors the running system so anomalies surface rather than accumulate silently.
One threat deserves singling out because it is specific to language-model agents: prompt injection, where adversarial content in a document, web page, or message tries to hijack an agent into doing something it should not. The naive framing treats this as a content-filtering challenge — catch the bad instruction before the model sees it. That arms race is unwinnable on its own, because there are infinitely many ways to phrase a malicious instruction.
The more durable defense is architectural, and it falls out of the controls already described. Even if an injected instruction successfully persuades an agent to attempt something harmful, the agent can only invoke its allowlisted skills, executes inside a sandboxed non-root container, cannot exceed the authority its graduation state has earned, and trips a human approval gate the moment it reaches for a consequential action. Injection might compromise an agent's intent; it does not grant the agent capabilities or authority it did not already have. Containing the actor matters more than perfectly filtering its inputs — because you will never filter perfectly, but you can always bound what a compromised actor is able to do.
Detection is only useful if you can act on it. A supreme kill-switch sits above anything that touches the real world, giving you the ability to halt agent action immediately. Treat it as the security equivalent of a circuit breaker: when something is going wrong faster than you can diagnose it, the correct first move is to stop the actor, then investigate. Authority can be extended generously precisely because it can be revoked instantly.
Where the system runs is itself a security decision. Self-hosted keeps data and execution entirely within your own boundary; managed runs on infrastructure we operate; and enterprise multi-tenant adds tenant isolation. Delivery is containerized with CI/CD, and TLS is terminated by Caddy. The Deployment models page covers the trade-offs in full, and the right isolation model depends on the sensitivity of the data and your regulatory context.
No single control here is sufficient alone, and that is the point. Identity, least-privilege allowlists, sandboxed execution, a tamper-evident audit trail, a kill-switch, and deployment isolation compose into defense in depth: an attacker or a misbehaving agent has to defeat every layer, while a defender needs only one layer to hold. That is the right security model for a privileged, autonomous, probabilistic actor.
The full Security architecture page documents each control in detail, and the Trust Center connects these mechanisms back to the governance framework they enforce. We do not publish compliance certifications we have not completed — request a security walkthrough for current status.
See every control documented in full — identity, least privilege, sandboxing, audit, and the kill-switch.
View the Security page →