Skip to content

Agent Security Model

BaselineOS provides defense-in-depth for AI agent operations. Every layer is designed to contain blast radius if any single component is compromised.


AI agents operate with real credentials and real infrastructure access. The threats:

ThreatExampleTraditional defenseBaselineOS defense
Credential exfiltrationPrompt injection causes agent to leak API keysDon’t put keys in promptsOpaque grants — agent never sees raw values
Credential persistenceCompromised agent retains access indefinitelyRotate keys manuallyTime-boxed leases — auto-expire in seconds
Lateral movementAgent with deploy access uses it for data exfiltrationNetwork segmentationPer-agent scoping — each agent sees only its credentials
Privilege escalationLow-trust agent performs high-risk operationsRBACTrust scoring — higher-risk steps require higher trust
No accountabilityUnknown agent accessed production credentialsManual log reviewPersistent audit trail — every access, every accessor, every timestamp
Uncontrolled executionAgent invents its own workflow with arbitrary commandsHope for the bestExecution protocols — agent follows defined steps with defined access

┌─────────────────────────────────────────────────────────────────┐
│ Layer 5: Execution Protocols │
│ Agent follows defined steps, not freeform commands │
├─────────────────────────────────────────────────────────────────┤
│ Layer 4: Trust Scoring │
│ Per-protocol, per-step, per-credential trust gates │
├─────────────────────────────────────────────────────────────────┤
│ Layer 3: Opaque Credential Grants │
│ Agent gets handle, not value — can use but can't exfiltrate │
├─────────────────────────────────────────────────────────────────┤
│ Layer 2: Time-Boxed Leases │
│ Credentials auto-expire after step timeout │
├─────────────────────────────────────────────────────────────────┤
│ Layer 1: Encrypted Vault │
│ AES-256-CBC at rest, per-agent scoping, persistent audit trail │
└─────────────────────────────────────────────────────────────────┘

Each layer contains failures in the layers above it:

  • Protocol defines the steps → but what if the step is compromised?
  • Trust scoring gates access → but what if trust is too high?
  • Opaque grants hide values → but what if the runner is compromised?
  • Leases auto-expire → but what if the vault is compromised?
  • Encryption protects storage → bottom of the stack

What Happens When Something Is Compromised

Section titled “What Happens When Something Is Compromised”

Agent is compromised (prompt injection, jailbreak)

Section titled “Agent is compromised (prompt injection, jailbreak)”

Traditional: Agent has VERCEL_TOKEN=vcel_... in its environment. Attacker has the token forever.

BaselineOS:

  1. Agent has a lease handle (lease-xxx), not the token value
  2. The lease expires in seconds (step timeout)
  3. Even if the attacker extracts the handle, it resolves to nothing after the step ends
  4. The agent never had the raw value in its memory or context

Traditional: Leaked ANTHROPIC_API_KEY is valid until manually rotated. Could be months.

BaselineOS:

  1. Leased credentials auto-expire after TTL (seconds to minutes)
  2. Vault audit trail shows exactly when and how the credential was accessed
  3. vault.rotate('anthropic', newKey) immediately invalidates all active leases
  4. vault.getAccessLog() shows the full forensic trail

Traditional: Agent reads process.env.DATABASE_URL — nothing stops it.

BaselineOS:

  1. Agent requests capability through protocol → trust score checked
  2. Agent trust 60 < required 85 for production database → denied
  3. Denial logged to audit trail with agent ID, trust score, and reason
  4. Agent cannot bypass — credential is not in env, only in vault

Organization defines:
defineBaseline({
credentials: {
vercel: { envVar: 'VERCEL_TOKEN', minTrustScore: 60 },
'prod-db': { envVar: 'DATABASE_URL', minTrustScore: 85, scope: 'agent', allowedAccessors: ['data-agent'] },
}
})
CredentialVault auto-resolves from env
Encrypts with AES-256-CBC
Persists to SQLite
Protocol defines steps:
S1: "Run tests" (no credentials needed)
S2: "Deploy" (needs vercel-token, trust 60)
S3: "Migrate DB" (needs prod-db, trust 85)
Agent starts execution:
trust score: 75
S1: ✓ passes (no credentials)
S2: ✓ vercel-token → lease issued (TTL 60s) → handle to agent → runner injects → lease revoked
S3: ✗ prod-db → DENIED (trust 75 < required 85) → logged → step fails

The Context Integrity Engine scans all context entries for embedded secrets:

// These patterns are rejected on context registration:
const secretPatterns = [
/(?:sk-|pk-|api[_-]?key)[a-zA-Z0-9\-_]{20,}/, // API keys
/(?:ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9_]{36,}/, // GitHub tokens
/xox[bpors]-[A-Za-z0-9\-]+/, // Slack tokens
/-----BEGIN (?:RSA |EC )?PRIVATE KEY-----/, // Private keys
/AKIA[0-9A-Z]{16}/, // AWS access keys
];

If an agent tries to register context containing secrets, it’s blocked before storage.


The /api/context/* endpoints include:

ControlImplementation
AuthenticationAPI key required (Bearer token or X-API-Key header)
AuthorizationauthorizedBy derived from auth context, not user input
Rate limiting20 destructive operations per hour
Input validationContent size limits (100KB), find/replace limits (10K chars)
Response sanitizationSnapshots stripped — no sensitive content in responses
Scope validationEnum-validated scope parameter
Body size limit1MB max request body

StandardControlBaselineOS Implementation
OWASP Agentic Top 10Tool MisuseExecution protocols define allowed tools per step
OWASP Agentic Top 10Prompt InjectionInput sanitizer (11 patterns), opaque credentials
OWASP Agentic Top 10Excessive AgencyTrust scoring gates autonomous actions
SOC 2Access ControlPer-agent credential scoping, trust scoring
SOC 2Audit TrailSQLite-persisted access log, lease lifecycle
SOC 2EncryptionAES-256-CBC at rest, TLS in transit
GDPR Art. 32Security of ProcessingEncryption, access control, audit trail
ISO 27001A.9 Access ControlTrust-scored, scoped, time-boxed access